Post Facto Polymorphism ®☺ is the Existential Programming technique of mapping an abstraction (defined after the fact of creating actual object instances or relational database entities or semantic network relations) to multiple preexisting ontologies (i.e. class/type definitions).
For example, suppose that there were several data sources that held data (specifically names) about "people", and these data sources were culturally specific. A typical European ontology/DB-schema might define 3 fields: firstName, middleName, lastName. A Native American oriented database might define a single Name field. An Asian database might have a familyName and a childName field. Yet another ethnic database encoded names into the European schema but understood that firstName was the "family name" for its data rather than lastName. If we wish to consolidate these data sources and define (after the fact) a new abstraction to work with all of them, we can use the external abstraction technique to adapt each of these apples and oranges to all be "fruit".
An AbstractName interface could be defined, and adapters defined for each "concrete class" that implement the mapping between that class' name-related properties and the properties of AbstractName. By having objects be "existential", thus allowing mixin adapter code to be dynamically added, the objects can all be accessed via the getFamilyName() and getTheNamePeopleCallMe() methods defined by AbstractName.
Of course, all this support for "cleaning up" data via external abstraction should not lessen the pressure to clean up the source data as much as possible (by which I mean cleaning up the schema and entity definitions more than the data itself). Otherwise, multiple "bad" data sources, each having their own classes/interfaces/source-of-record IDs/"package IDs", would need to be specified when referencing data in order to disambiguate which data source is desired. By cleaning up the schemas (to make the semantics more well-defined), fewer explicit "package ID" references are required, and one's code is much simpler/cleaner.
Saturday, October 7, 2006
External Abstractions, Existential Classes, Object Views
Traditionally, object oriented modeling proceeds "top down" in that classes/interfaces are defined first and then instances (i.e. "objects") are created. Essence Precedes Existence. The objects consist of the class-defined attribute set, the whole class-defined attribute set, and nothing but the class-defined attribute set!
In Existential Programming, the developer can define "bottom up" classes/interfaces where, after the fact, attributes can be chosen from existing ones, or new "computed" attributes can be added. Existence Precedes Essence. This is very similar to the database concept of "view" where some subset of columns can be dealt with as if they were their own table/entity. These views can also add "computed" or "joined" columns thus supporting the philosophy that there is a fine (or non-existent) line between "behavior" and "data". They are all just properties, and it is inside the black box that the details exist of what was required to get or set that property (using Java-speak).
A difference between this object view idea and normal interfaces is that the interface declaring a method has to be defined before the class that implements it is defined, which in turn is before an object (based on that class definition) is created. With a view, an object could be "cast" to some defined-after-object-creation interface, as long as that object had the methods/properties that the view expected. Otherwise, potentially a cast-exception could be thrown (or, it could just let you work with any methods that *did* match). Of course, with Existential Programming, all "interfaces" and "classes" are the same as "views". There is no single "real" version of an object/entity with other versions just "views". This is different from database views where down deep there IS a "real" table (or several if the view was really a "join"). With Existential Programming's strategy of using EAV structure for persistence, all entity definitions / classes / interfaces are "views".
This sort of thing is done with dynamic languages like JavaScript and is called "duck typing". This name reflects 2 puns simultaneously:
External Abstraction ®☺
A related idea I've had is that of the "external abstraction" where, instead of merely views, whole hierarchies of super-classes, subclasses, and interfaces are defined after the fact. ["Internal" abstractions would be those normal OO abstractions where the super-classes and interfaces supported by an object were those that were defined when the object instance was created (i.e. its essence).]
The point of "external" abstractions is that someone "external" (i.e. other than the ontology creator and object instantiator) can decide (after an object has been created) that some aspect of the object is overly specific to one point of view and needs to support other points of view. For example, the "name" attribute of "person" classes is often broken into first name, middle name, last name which is very culturally biased. So, support needs to be added for other cultures where "last name" is not synonymous with "family name", and there is no "Christian name", and not everyone has three names. Is Dances with Wolves' middle name "with"?
Normally, if one were refactoring that class hierarchy to better handle names, one would convert the 3 string fields representing the name into a reference to a new Name interface and define classes that implement various permutations of name. The interface could have methods like getFamilyName() which determined the value based on the particular cultural variant name class instantiated. This is all nice when one can do this beforehand such that is it all compiled and ready for the code to use before the person object is instantiated. With External Abstraction, the developer can do this refactoring and apply it to an entity that has already been created and exists as an object or persisted to some data store. By adding in a newly created " mixin" interface/class to the preexisting object, the 3 name fields can be reinterpreted and mapped to other ontologies supporting other cultures.
In Existential Programming, the developer can define "bottom up" classes/interfaces where, after the fact, attributes can be chosen from existing ones, or new "computed" attributes can be added. Existence Precedes Essence. This is very similar to the database concept of "view" where some subset of columns can be dealt with as if they were their own table/entity. These views can also add "computed" or "joined" columns thus supporting the philosophy that there is a fine (or non-existent) line between "behavior" and "data". They are all just properties, and it is inside the black box that the details exist of what was required to get or set that property (using Java-speak).
A difference between this object view idea and normal interfaces is that the interface declaring a method has to be defined before the class that implements it is defined, which in turn is before an object (based on that class definition) is created. With a view, an object could be "cast" to some defined-after-object-creation interface, as long as that object had the methods/properties that the view expected. Otherwise, potentially a cast-exception could be thrown (or, it could just let you work with any methods that *did* match). Of course, with Existential Programming, all "interfaces" and "classes" are the same as "views". There is no single "real" version of an object/entity with other versions just "views". This is different from database views where down deep there IS a "real" table (or several if the view was really a "join"). With Existential Programming's strategy of using EAV structure for persistence, all entity definitions / classes / interfaces are "views".
This sort of thing is done with dynamic languages like JavaScript and is called "duck typing". This name reflects 2 puns simultaneously:
- "quacks-like-a-duck" typing (vs strong or weak typing)
- duct taping (which is what all weakly typed programming is in the opinion of many)
External Abstraction ®☺
A related idea I've had is that of the "external abstraction" where, instead of merely views, whole hierarchies of super-classes, subclasses, and interfaces are defined after the fact. ["Internal" abstractions would be those normal OO abstractions where the super-classes and interfaces supported by an object were those that were defined when the object instance was created (i.e. its essence).]
The point of "external" abstractions is that someone "external" (i.e. other than the ontology creator and object instantiator) can decide (after an object has been created) that some aspect of the object is overly specific to one point of view and needs to support other points of view. For example, the "name" attribute of "person" classes is often broken into first name, middle name, last name which is very culturally biased. So, support needs to be added for other cultures where "last name" is not synonymous with "family name", and there is no "Christian name", and not everyone has three names. Is Dances with Wolves' middle name "with"?
Normally, if one were refactoring that class hierarchy to better handle names, one would convert the 3 string fields representing the name into a reference to a new Name interface and define classes that implement various permutations of name. The interface could have methods like getFamilyName() which determined the value based on the particular cultural variant name class instantiated. This is all nice when one can do this beforehand such that is it all compiled and ready for the code to use before the person object is instantiated. With External Abstraction, the developer can do this refactoring and apply it to an entity that has already been created and exists as an object or persisted to some data store. By adding in a newly created " mixin" interface/class to the preexisting object, the 3 name fields can be reinterpreted and mapped to other ontologies supporting other cultures.
Labels:
abstraction,
epiphanies,
existential programming,
language,
polymorphism,
POSTSCRIPT,
types,
views
Monday, July 17, 2006
Existential Programming as Quantum States
In reading about Quantum States in Wikipedia...
"In quantum physics, a quantum state is a mathematical object that fully describes a Quantum system. One typically imagines some experimental apparatus and procedure which "prepares" this quantum state; the mathematical object then reflects the setup of the apparatus. Quantum states can be statistically mixed, corresponding to an experiment involving a random change of the parameters. States obtained in this way are called mixed states, as opposed to pure states, which cannot be described as a mixture of others. When performing a certain measurement on a quantum state, the result generally described by a probability distribution, and the form that this distribution takes is completely determined by the quantum state and the observable describing the measurement. However, unlike in classical mechanics, the result of a measurement on even a pure quantum state is only determined probabilistically. This reflects a core difference between classical and quantum physics.
Mathematically, a pure quantum state is typically represented by a vector in a Hilbert space. In physics, bra-ket notation is often used to denote such vectors. Linear combinations (superpositions) of vectors can describe interference phenomena. Mixed quantum states are described by density matrices."
...I was struck by the analogy with Existential Programming which proposes that objects hold multiple values for various properties (and in fact multiple sets of properties, hence, multiple ontologies) simultaneously.
Unlike Quantum States however, reading one set of values doesn't make the other sets vanish! ;-)
Labels:
epiphanies,
existential programming,
fuzzy,
mathematics,
ontologies,
quantum,
vector space
Saturday, June 24, 2006
Notes on Ontology Tools
While reading about ontology tools[2], I found a tool (Hozo) that supports "roles" explicitly (see #17 of the original Existential Programming epiphanies, and a role case study). Hozo separates "role concepts" from "basic concepts"; however, the tool allows each role attribute to be mapped to a basic concept attribute. I see that an existential programing language should allow "roles" to "inherit" from its "roleholder" ala subclass inheritance without actually being a subclass. [Hmmm... in an existential programming language, where all "classes" were effectively mixins anyway, how would roles be different?]
(click to enlarge)
From[3], seeing CYC's concepts of #$is-a versus #$genls reminds me of a discussion I had back in 2002 with the Protege 2000 folks at Stanford who produced a Wine ontology, where I wanted to have no distinction between classes and instances because I wanted a hierarchy like wine->reds->shiraz->Rosemont->vintage94->bottle#123. I.E. something considered a leaf on the tree might later be a node with children itself. Protege would only allow variables to take on values that were "instances" and I wanted to put "chardonnay" (a subclass) as the value of a "wine variety" property. SO, is there no difference between classes and object, or should the "value" of an attribute be able to contain a "class" reference??
From[4], seeing the Semantic Web's layer cake, I see that my ideas about recording "says who?" and "how reliable are you?" seem similar to the "trust layer". [Ed. note 11-23-07: like maybe you read this stuff years ago and it was the subliminal seed of this "says who" epiphany?]
[1] Tutorial on Ontological Engineering: Part 2: Ontology Development, Tools and Languages, Riichiro Mizoguchi, 2004
[*** Get the PDF here ***]
[2] ibid, Page 14, Fig. 2.
[3] ibid, Page 15, Section 3.1
[4] ibid, Page 23
(click to enlarge)
From[3], seeing CYC's concepts of #$is-a versus #$genls reminds me of a discussion I had back in 2002 with the Protege 2000 folks at Stanford who produced a Wine ontology, where I wanted to have no distinction between classes and instances because I wanted a hierarchy like wine->reds->shiraz->Rosemont->vintage94->bottle#123. I.E. something considered a leaf on the tree might later be a node with children itself. Protege would only allow variables to take on values that were "instances" and I wanted to put "chardonnay" (a subclass) as the value of a "wine variety" property. SO, is there no difference between classes and object, or should the "value" of an attribute be able to contain a "class" reference??
From[4], seeing the Semantic Web's layer cake, I see that my ideas about recording "says who?" and "how reliable are you?" seem similar to the "trust layer". [Ed. note 11-23-07: like maybe you read this stuff years ago and it was the subliminal seed of this "says who" epiphany?]
[1] Tutorial on Ontological Engineering: Part 2: Ontology Development, Tools and Languages, Riichiro Mizoguchi, 2004
[*** Get the PDF here ***]
[2] ibid, Page 14, Fig. 2.
[3] ibid, Page 15, Section 3.1
[4] ibid, Page 23
Labels:
epiphanies,
ontologies,
roles
Saturday, June 17, 2006
Philosphers Toolkit
In the handy Philosopher's Toolkit book[1], there is a section[2] explaining the difference between "categorical" statements and "modal" statements. In reading it, I see that some of my intuitions about the assumptions implicit in the object oriented programming model (e.g. "what time period was this data true?", "says who?", etc) were actually a recognition that OO models contain "categorical" assertions and do not (without explicit programming) support "modalities". Temporal modality, intensional logics, etc. E.G. there is no "date range" associated with each attributes value.
In another section[3], Leibniz's law of identity (which says that A is the same "thing" as B if all attributes of A are equal to their corresponding attributes in B), relates to my epiphany #5. But which set of properties are necessary/sufficient to claim a match? It depends on the ontology. Consider "cross temporal identity"...the river of today vs the river of yesterday..."molecules" vs "water". For people, the properties are often not used to identify them, but instead a "continuity of memory" connects yesterday's YOU vs today's YOU.
In another section[4], the difference between "types" and "tokens" are discussed. Type is an analog of "class". Token is an analog of "object". Type-identical is an analog of "instanceOf". Token-identical is an analog of "address-of(A) == address-of(B)".
[1] The Philosopher's Toolkit, Julian Baggini and Peter S. Fosl, Blackwell Publishers, 1st Ed., 2003, ISBN: 0631228748
[2] ibid, section 4.4
[3] ibid, section 3.6
[4] ibid, section 4.17
In another section[3], Leibniz's law of identity (which says that A is the same "thing" as B if all attributes of A are equal to their corresponding attributes in B), relates to my epiphany #5. But which set of properties are necessary/sufficient to claim a match? It depends on the ontology. Consider "cross temporal identity"...the river of today vs the river of yesterday..."molecules" vs "water". For people, the properties are often not used to identify them, but instead a "continuity of memory" connects yesterday's YOU vs today's YOU.
In another section[4], the difference between "types" and "tokens" are discussed. Type is an analog of "class". Token is an analog of "object". Type-identical is an analog of "instanceOf". Token-identical is an analog of "address-of(A) == address-of(B)".
[1] The Philosopher's Toolkit, Julian Baggini and Peter S. Fosl, Blackwell Publishers, 1st Ed., 2003, ISBN: 0631228748
[2] ibid, section 4.4
[3] ibid, section 3.6
[4] ibid, section 4.17
Labels:
philosopher's toolkit,
philosophy
Wednesday, June 7, 2006
Ontology Merging Strategy
Summarizing the posts from yesterday, there is a general problem of "things" in one ontology/data model not mapping (in a definite way) to "things" in another model. How to support (or even automate) mapping from one model to another? I.E. how to facilitate "transformation" from one "basis" to another?
A strategy at the heart of an existential programming language could be to reduce entities to their most atomic level: semantic relations between an entity and a single attribute. Use "identity" algorithms to reconstitute these atoms into "things".
A new language that did this and integrated multiple sources of data (OO data, E/R relational data, semantic networks, web-search-results) could create a single seamless framework and data continuum.
[Ed. Note: as found 10/29/07, others have had similar ideas.]
A strategy at the heart of an existential programming language could be to reduce entities to their most atomic level: semantic relations between an entity and a single attribute. Use "identity" algorithms to reconstitute these atoms into "things".
A new language that did this and integrated multiple sources of data (OO data, E/R relational data, semantic networks, web-search-results) could create a single seamless framework and data continuum.
[Ed. Note: as found 10/29/07, others have had similar ideas.]
Ontology Mismatch Case Study: Customers & Obligors
Here is a real world example (from a major bank) of the problems of ontology mismatches between different silo systems whose data must never-the-less be integrated. Some systems have the concept of "customer" and implement a customer entity and customer key. Other systems (which do not talk to each other, i.e. there is no universal "system of record" for person or legal entity) have a customer concept, but they are distributed geographically and they have a different key for each state or regional location, and they are called "obligors". So, obligors should be a simple one-to-many relationship to customers.
However, since errors are made by the automated contact address parsing algorithms that try to figure out which customer is associated with which obligor, multiple customers can be associated with a single obligor. Hence, customers and obligors have a many-to-many relationship, and therefore, customers are many-to-many within themselves! Obligors are many-to-many within themselves! Customers not only have duplicates for the same person, they don't always represent a definite person or even set of definite people. They are vague and refer to parts of multiple people. Customers are effectively anything with a customer ID! Very existential.
A particular obligor (which, again, should be a particular customer in a particular location) was linked with three customers: JoeBlow, JaneBlow, a-customer-with-Jane's-name-and-Joe's-SSN! To make things worse, the attempt to clean up customers by defining them as a role of a "legal entity" didn't work in this case because the "customer" was really a married-household which was not a "legal entity" because it doesn't have its own tax id! Even worse, the rationale that legal entities are those things that are separately liable for money demands ignores the fact that both parties in a married household are liable (but even then differing on a state by state basis). Whew!
However, since errors are made by the automated contact address parsing algorithms that try to figure out which customer is associated with which obligor, multiple customers can be associated with a single obligor. Hence, customers and obligors have a many-to-many relationship, and therefore, customers are many-to-many within themselves! Obligors are many-to-many within themselves! Customers not only have duplicates for the same person, they don't always represent a definite person or even set of definite people. They are vague and refer to parts of multiple people. Customers are effectively anything with a customer ID! Very existential.
A particular obligor (which, again, should be a particular customer in a particular location) was linked with three customers: JoeBlow, JaneBlow, a-customer-with-Jane's-name-and-Joe's-SSN! To make things worse, the attempt to clean up customers by defining them as a role of a "legal entity" didn't work in this case because the "customer" was really a married-household which was not a "legal entity" because it doesn't have its own tax id! Even worse, the rationale that legal entities are those things that are separately liable for money demands ignores the fact that both parties in a married household are liable (but even then differing on a state by state basis). Whew!
Labels:
bank,
case study,
existential programming,
fuzzy,
ontologies,
roles
What is "Identity" in OOP & ER & SN Data Models?
When trying to map Object Oriented Programming models, to Entity Relationship models, to Semantic Network models, how does the philosophical concept of "identity" get handled? I.E. how is a "thing" identified in each model? [And the following assumes that incorrect criteria is not used e.g. using the "name" of the thing as its "identifier".]
- OOP models assume that the "object pointer" (or object "handle") is a global unique identifier (GUID) for the "thing" represented by that object instance of that class.
- E/R models assume that there is either an opaque key (ala sequence numbers) or some set of attributes whose combined values form a GUID for the "thing" represented by that row of that table.
- S/N models assume that there is some explicit or internal key associated with each "entity".
Labels:
equals,
existential programming,
identity
Object Orientation's Ontological Assumptions
Once one realizes that Object Oriented Programming is isomorphic with Semantic Networks[1][2], and one is cognizant of the meta-data it takes to represent imperfect data from a variety of sources (e.g. data mining the WWW), it becomes clear that OOP makes several large assumptions when modeling the world. These assumptions lie at the root of many problems mapping OO models to relational E/R data models.
The Class hierarchy defined in an OO program represents a model of entities, their attributes, and their relationships with other entities; i.e. an Ontology. Unlike modern semantic network approaches, where it is clear that a multiplicity of ontologies must be recognized and mediated between, OO Classes implicitly assume that they are "the only model", "the correct model", "the universal model". Some assumptions of OO, as normally practiced, are...
[2] http://www.semanticresearch.com/semantic/index.php
The Class hierarchy defined in an OO program represents a model of entities, their attributes, and their relationships with other entities; i.e. an Ontology. Unlike modern semantic network approaches, where it is clear that a multiplicity of ontologies must be recognized and mediated between, OO Classes implicitly assume that they are "the only model", "the correct model", "the universal model". Some assumptions of OO, as normally practiced, are...
- Only a single ontology is supported. OOP needs a way of mixing Class hierarchies where each is a different perspective on the same "thing(s)".
- No model exists for describing the author of the ontology. It is potentially implied by its [Java] package name (when that concept applies), but as far as other attributes of the author, there is no way to represent the "reliability" of the author, or of this particular model, or of a particular set of data values associated with this model.
- No model of whether particular values of Object attributes are "true", "up to date", "not vague", "not fuzzy" (i.e. clusters of possible points with probabilities for each point).
- No concept of object instances overlapping; each object either exists or not; objects don't "partially overlap" each other; objects exist in a single place in a single "copy". In other words, OOP doesn't distinguish between "a thing" and some number of (potentially imprecise) "representations" of that thing.
- The Class hierarchy is assumed to be the only way to classify/divide the world into "things" (anyway, at least the "things" that those classes model).
- An instance of Class X is assumed to be a member of the set of all Xs in the world. I.E. OOP doesn't have a way to say, I've created an object instance, but whether it is a member of the class of all X is not tied to whether it was created as an instance of Class X at birth. OOP doesn't support an agnostic attitude towards class/type membership. In still other words, Essence precedes Existence!
- The values of all entity attributes (aka an object instance) are assumed to be available in a single contiguous location. I.E. OOP can't normally handle attribute values being spread all over creation (as would be the case for data mined about someone via web page searches). OOP can't normally handle taking widely different amounts of time to retrieve different attributes (as would be the case in data mining operations).
[2] http://www.semanticresearch.com/semantic/index.php
Labels:
accidental/essential,
equals,
existential programming,
fuzzy,
ontologies
Three Levels of "Existential-ness" Support?
In thinking about how one would build "a language" and/or tools to support Existential Programming, there seemed three increasing levels to sort features into.
Level I - Model Mapping
Level I - Model Mapping
- Make it easy to map Object-Oriented models to Entity-Relationship models to Semantic-Network models. I.E. implement OO persistence layer in the style of the EAV approach to semantic network databases. Implement auto-translation of data in traditional E/R tables into EAV records. Implement auto loading of data into OO model from arbitrary EAV tuples (and therefore arbitrary relational tables). In other words, automated persistence with automatic data mapping.
- Make it easy to accept ontologies and data from multiple sources; i.e. not just relational database. Example data sources could be: Web searches, Enterprise Silo systems, etc. In other words, build common adapters and mediators to broaden the reach of the "language" beyond structured local databases.
- "Consider the source". Make it easy to associate fuzzy logic factors to data-assertions and ontology-assertions of all granularities, based on the source of the data, the ontology, and even the assertions themselves. Examples are: for any given attribute value, "say's who?", "said when", "how reliable is this source?", "how reliable is this source for this attribute?", "who says that this attribute even applies to this class of thing", "how reliable is the source about the ontology definitions?". I want to be able to encode: "Sam is 89% trustworthy about colors", "Joe lies about AGEs", "Harry is 100% reliable when he says that Joe lies about AGEs", etc.
- Make it easy to handle attribute values that are themselves fuzzy. I.E. Probabilistic attribute values, conflicting values, cluster values, vague values, time varying values, outdated values, missing values, values whose availability is defined by some set of limits on the effort expended in finding the value (e.g. find all values of phone for joe blow that can be found within 10 seconds real time).
Labels:
existential programming,
fuzzy,
language,
tools
Tuesday, June 6, 2006
Identity() versus Equals()
To expand on item 5 in my original entry, it seems that object oriented languages need to be extended to support the following notions.
- Identity() as a separate model-definable function rather than using a single "key" in the form of an object pointer or reference. It would define whether multiple "things" are the "same thing".
- Equals() is different than Identity() because objects being equal is not the same thing as "the thing this object represents" is the same as "the thing that object represents".
- Determining the membership of "object 123" in the "set of all instances of class X" could/should be via an explicit list (along with "says who?", "as of when?", etc) rather than an intrinsic property of that object.
- Class definitions are in the mind of the "viewer" and can be applied to any object. Therefore, one should be able to use a mixture of many ontologies.
- Attributes of objects should be stored independently so that they are available to all "views", "classes", "entity types", EAV tuples, etc, etc.
Monday, June 5, 2006
The Original Epiphanies of Existential Programming
The items below are a summary of the several AHA! moments I had over the May/June 2006 time frame. [see my std disclaimers]
It began with contemplating how Object-oriented modeling, and Entity-relationship modeling, and Semantic Network modeling are all isomorphisms of each other. Next I realized that O/O and E/R models are way too rigid because they expect a single "correct" model to work, whereas Semantic modelers pretty much know it is futile to expect everyone to use a single ontology! So, where would it take us to explore doing O/O and database development with that in mind? Next I had the intuition that Philosophy (with a capital P) probably had something to say about this topic and so I started reading Philosophy 101 books to learn at age 50 what I never took in college. It quickly became obvious that Philosophy has SO MUCH to say about these topics that it is criminal how little explicit reference to it there is in the software engineering literature.
It began with contemplating how Object-oriented modeling, and Entity-relationship modeling, and Semantic Network modeling are all isomorphisms of each other. Next I realized that O/O and E/R models are way too rigid because they expect a single "correct" model to work, whereas Semantic modelers pretty much know it is futile to expect everyone to use a single ontology! So, where would it take us to explore doing O/O and database development with that in mind? Next I had the intuition that Philosophy (with a capital P) probably had something to say about this topic and so I started reading Philosophy 101 books to learn at age 50 what I never took in college. It quickly became obvious that Philosophy has SO MUCH to say about these topics that it is criminal how little explicit reference to it there is in the software engineering literature.
- When mapping Object Oriented classes to semantic networks I realized that CLASSES/SUBCLASSES etc were the same as sets of semantic-relationship-triples (Entity-Attribute-Value aka EAV records) and therefore a class hierarchy formed an ontology (as used in the semantic network/web/etc world). AHA! It is futile to get everyone to agree upon ONE ontology (from my experience), SO, that is why it is a false assumption of O/O that there can/should be a single Class hierarchy. But, all O/O languages fundamentally assume this which is why they are hard to map to relational databases. Databases explicitly provide for multiple "views" of data. And in Enterprise settings, where there are often multiple models (from different stovepipe systems) of the same basic data, this causes even more of a mismatch with the single object model.
- Mapping O/O Class hierarchies to DB E/R models to Semantic Networks brings up questions about the meaning of Identity (with a capital I) and Essential vs Accidental properties. AHA! This sounds like Philosophy (which had I not started reading about before transcribing these notes into a blog, I would have not known terms like Essential and Accidental and Identity with a capital I to even use them here), SO, it would be worth learning Philosophy to improve my Software Engineering and Computer Science skills.
- Having now worked with both Java and Javascript deeply enough to understand class versus prototype based languages (see my AJAX articles), I see that Java is like Plato's view of the world, and Javascript is more like Existentialism (where an object can be instantiated without saying what "type" it is).
- Web pages can be thought of as a database whose data model/ontology is implied. Data mining can be done on it where the URL and the "time of last update" are added to each EAV tuple extracted from the page to extend a normal EAV "fact" with a "says who?" dimension and a temporal dimension to the database. In order to really capture all the nuances of the data mined from the web a standard data model ala O/O or E/R models have to also add some model of:
- completeness
- accuracy
- different values at different points of time
- not only "say's who?" but "say's how?" i.e. which ontology is being used implicitly or explicitly
- only some attributes of a "thing" are being defined on any given URL
- O/O languages could/should be extended to make it easier to work with arbitrary sets of semantic network relationships/tuples such that it could handle integration of various (E/R, Enterprise, web page, data mining) data models.
- Google, Homeland Security, Corporate data warehouses all would benefit from being able to work with "everything we know about X". This could be a good technique to integrate disparate data sources.
- O/O languages need to be more like Javascript in letting any set of attributes be associated with an object and "classes" are more like "roles" or interfaces that the VIEWER chooses instead of tightly coupling the attribute set to a predefined list. The VIEW chosen by the viewer/programmer can still be type-safe once chosen BUT it cant assume the source of data used the same "view".
- "View" (see above) includes all aspects of traditional classes PLUS parameters for deciding trustworthiness, deciding the "identity" of the thing that attributes are known about, and all other "unassumable" things. An O/O language could set defaults for these parameters to match the assumptions of traditional programming languages.
- Searching the web and trying to integrate the data is much like trying to integrate the data from disparate silo systems into a single enterprise data model or data warehouse. They both need to take into account where each data value came from, how accurate/reliable those sources are, and how their ontologies map to each other and accumulate attributes from different sources about the same entity.
- When dealing with the sort of non-precise, non-reliable values of object properties as found on the web, the following are needed as a part of the "ontology" defined to work with that data:
- Equality test should return a decimal probability ( 0..1) rather than a true/false value
- Find/Search operations should allow specification of thresholds to filter results
- Property "getters" become the same as "find" operations
- The result of a get/find is a set of values, each includes a source-of-record & time/space region i.e. say's who?, when and where was this true?
- Property "setters" should accept parameters for source-of-record-spec, time/space region, data freshness, as well as probability factor, or other means of specifying cluster values, vague values, etc.
- Multiple levels of granularity with regard to setting probability of truth values for entire source-of-record as well as for individual "fact"
- How to handle deciding what a thing is? What "level" of abstraction/reality is it on? E.G. an asteroid is a loose collection of pebbles, but that means that the parts of something don't always "touch" the thing. i.e. What is the real difference between the following:
- x is a part of y
- x is touching y
- x and y are in the set S
- How are attribute values of null to be interpreted? What is the difference between "has definitely no value" and "dont know the value"? Attributes of X (according to some given ontology) are either:
- Identity Criteria
- Required as Essential
- known as possible (but optional)
- unanticipated/unknown (but a value was found)
- unanticipated and not found (i.e. not conceived of)
- It is a big deal to understand the borderline between the set of "thing"s (aka entity, object) and the set of "value"s (e.g. 1,2,3,a,b,c,true,false,etc) especially when many OO languages represent them all with "objects".
- It is a big deal to handle the problem where ontologies mismatch each other with regard to "what is a thing" and "where does one thing end and another one begin". E.G.
- parts of A == parts of B but A<>B
- overlapping things like jigsaw puzzle pieces vs the objects in the completed puzzle picture
- a defacto Customer record that does not equal a "person" because the name belonged to one person but the SSN belonged to another. On the other hand, if the "customer" can really be "a married household" but the system can't handle that, then this customer record is not overlapping people, it is just incomplete. On the other other hand, how do the customer records for the husband and wife jive with the "household"?
- There are attributes of an entity and there are "meta-attributes", e.g. an EAV tuple of an attribute could be (object123,color,green) [where "color" and "green" should be defined in the ontology in question.] Meta-attributes could be...
- "which ontology is this based on?", (i.e. "whose definitions are we using?")
- "says who?", (source of the data)
- "and when was it said?", (date source was queried)
- "over what period of time was it green?" (because values change over time)
- If objects can have arbitrary collections of attributes, and they are not any definite "thing", then how do you know what-is-a/when-to-create-a-new-instance-of-the "thing"?? And where does one "thing" end and the next one begin?
- Intuitively, people agree on when one person begins and another person ends even if we cant define how/why. This is not true of abstract concepts. Modeling should find the easy to recognize real-world entities and use them in preference to concepts (which are often roles anyway like customer or prospect or employee).
- People "know" other people (i.e. recognize them later) via shared "events" which both can verify to each other. [Just like the shared PIN# secret between you and the bank. And now increasingly asking all sorts of personal questions like whats your favorite movie?]
Labels:
accidental/essential,
epiphanies,
existential programming,
fuzzy,
origins,
POSTSCRIPT,
roles
Wednesday, April 19, 2006
There is no such thing as common sense
"Common sense is the collection of prejudices acquired by age eighteen."
-Einstein
"The man who has no tincture of philosophy goes through life imprisoned in the prejudices derived from common sense, from the habitual beliefs of his age or his nation, and from convictions which have grown up in his mind without the co-operation or consent of his deliberate reason."-Bertrand Russell
"unencumbered by the thought process"
- motto of radio show Car Talk
This is a rant page, creating a location I can point to, expounding my claim that there is no such thing as common sense, and if you think that there is then you haven't gotten out into the world enough. The only reason common sense seems common is that you've only dealt with people very much like you. There are other web pages that take a similar stance.
"To determine the value of philosophy, we must first free our minds from the prejudices of what are wrongly called 'practical' men."-Bertrand Russell
A secondary belief of mine is that, real or not, so-called common sense should not be the answer to the question "Why?" or "How?". It is an answer that is a get-out-of-jail-free card for non-thinkers. It allows them to make decisions Unencumbered By The Thought Process. I've always thought this (and it doesn't matter that I might have Asperger's... LOL), but I was bolstered in my belief when I found out that most Philosophers think so little of it that they have a name for it (which isn't a compliment): Naive Realism.
See/Hear the "Making Decisions" show of Philosophy Talk where the guest talks about the documented value of skeptics & contrarians in group settings, not because they are right or wrong, but because they free up people to go against group think who otherwise wouldn't speak up. Better for decisions even if more of a pain in the butt for those going thru the process.
Episode 12 (Probability and Modern Science) of the TTC Video Series Mathematics, Philosophy, and the 'Real World' gives a couple of examples of naive intuition as a bad thing…
- A study from 1980 is mentioned where a number of medical Drs were asked to translate the word "likely" (as in "likely to have a disease") into a percentage (as in "percent chance of having the disease"). Amazingly, the answers ranged from 20% to 95%
- An example is given of people's intuition about statistics being very wrong where a jury thinks there is a good chance that a witness is correct when he says he saw a blue car at night. FACTS: (1) witness was tested as being 80% correct when reporting car color at night, AND, (2) there are 15% blue cars and 85% green cars in population. In fact the probability that he was wrong is over half!
Because of so many green cars in total population, the witness will wrongly identify more green cars as blue (17% of total population) [i.e. 20% wrong of 85% green; 85x.2=17] than correctly identify blue cars as blue (12% of total population) [i.e. 80% right of 15% blue; 15x.8=12].
Labels:
philosophy,
rant
Thursday, April 13, 2006
Do Objects Have Souls?
While writing the article, Implementing "Real" Classes in JavaScript for Developer.com, I was tempted to add a sidebar with the provocative title "Do Objects Have Souls?". The article itself demonstrated a technique for simulating Java-like classes in JavaScript, and as introductory material, it explained the difference between Java's "class"-based semantics versus JavaScript's "object prototype"-based semantics. In trying to understand the differences myself, I began musing on the parallels between philosophical notions of "the soul" and JavaScript's empty shell of an "object" that is generated by obj = new Object;
Souls as property containers
In western philosophy there has been a 2500 year old school of thought that "things" (aka objects) have properties, some of which can never change (i.e. essential properties) versus those which may change over time (i.e. accidental properties). One concept of "soul" is that it is the bundle of essential properties that constitute a thing. This idea has also been equated with "identity". Attached (non-permanently) to the attribute bundle are the various accidental properties. This sounds a lot like the "empty" JavaScript object which is ready to add and update and delete [accidental] properties, while all the time keeping constant the essential, unchanging, object identity (as referenced by obj !).
When medieval alchemists distilled liquids into their essences they called them spirits because they were the "soul" of the grape, herb, flower, etc. To insure removing all of the accidental properties, they distilled things 5 times to produce the quintessential spirit. Distilled spirits in the alcohol sense often have names that reflect the notion that they have a life essence captured in them. Whiskey and Aquavit are both names that translate into "water of life" in their original languages. In the movie Perfume, a villain repeatedly attempts to distill the essence of pretty women using the same techniques as distilling flowers into perfume. [Spoiler Alert: flowers don't survive the process...]
When the Well of Souls runs out of RAM
Another aspect of souls that rhymes with JavaScript is the ancient lore that newborns are given a soul at birth which is plucked from a "well of souls" (aka the chamber of Guf). In JavaScript, as empty objects are created and given an identity, they are plucked from a heap of available memory (i.e. dynamic memory allocation). In both cases, bad things happen when there are none left.
When the well of souls runs dry, the Messiah will come and reboot the world; when your browser runs out of heap space, your JavaScript will gag and someone will have to come and reboot the browser (or at least the web page). The plot of the 1988 Demi Moore film, The Seventh Sign, is based on the Guf mythology. Demi's baby is due to be born on February 29 which is the date on which the last soul will leave the Guf and it will be empty.
Souls as property containers
In western philosophy there has been a 2500 year old school of thought that "things" (aka objects) have properties, some of which can never change (i.e. essential properties) versus those which may change over time (i.e. accidental properties). One concept of "soul" is that it is the bundle of essential properties that constitute a thing. This idea has also been equated with "identity". Attached (non-permanently) to the attribute bundle are the various accidental properties. This sounds a lot like the "empty" JavaScript object which is ready to add and update and delete [accidental] properties, while all the time keeping constant the essential, unchanging, object identity (as referenced by obj !).
When medieval alchemists distilled liquids into their essences they called them spirits because they were the "soul" of the grape, herb, flower, etc. To insure removing all of the accidental properties, they distilled things 5 times to produce the quintessential spirit. Distilled spirits in the alcohol sense often have names that reflect the notion that they have a life essence captured in them. Whiskey and Aquavit are both names that translate into "water of life" in their original languages. In the movie Perfume, a villain repeatedly attempts to distill the essence of pretty women using the same techniques as distilling flowers into perfume. [Spoiler Alert: flowers don't survive the process...]
When the Well of Souls runs out of RAM
Another aspect of souls that rhymes with JavaScript is the ancient lore that newborns are given a soul at birth which is plucked from a "well of souls" (aka the chamber of Guf). In JavaScript, as empty objects are created and given an identity, they are plucked from a heap of available memory (i.e. dynamic memory allocation). In both cases, bad things happen when there are none left.
When the well of souls runs dry, the Messiah will come and reboot the world; when your browser runs out of heap space, your JavaScript will gag and someone will have to come and reboot the browser (or at least the web page). The plot of the 1988 Demi Moore film, The Seventh Sign, is based on the Guf mythology. Demi's baby is due to be born on February 29 which is the date on which the last soul will leave the Guf and it will be empty.
Subscribe to:
Posts (Atom)