Saturday, October 7, 2006
For example, suppose that there were several data sources that held data (specifically names) about "people", and these data sources were culturally specific. A typical European ontology/DB-schema might define 3 fields: firstName, middleName, lastName. A Native American oriented database might define a single Name field. An Asian database might have a familyName and a childName field. Yet another ethnic database encoded names into the European schema but understood that firstName was the "family name" for its data rather than lastName. If we wish to consolidate these data sources and define (after the fact) a new abstraction to work with all of them, we can use the external abstraction technique to adapt each of these apples and oranges to all be "fruit".
An AbstractName interface could be defined, and adapters defined for each "concrete class" that implement the mapping between that class' name-related properties and the properties of AbstractName. By having objects be "existential", thus allowing mixin adapter code to be dynamically added, the objects can all be accessed via the getFamilyName() and getTheNamePeopleCallMe() methods defined by AbstractName.
Of course, all this support for "cleaning up" data via external abstraction should not lessen the pressure to clean up the source data as much as possible (by which I mean cleaning up the schema and entity definitions more than the data itself). Otherwise, multiple "bad" data sources, each having their own classes/interfaces/source-of-record IDs/"package IDs", would need to be specified when referencing data in order to disambiguate which data source is desired. By cleaning up the schemas (to make the semantics more well-defined), fewer explicit "package ID" references are required, and one's code is much simpler/cleaner.
In Existential Programming, the developer can define "bottom up" classes/interfaces where, after the fact, attributes can be chosen from existing ones, or new "computed" attributes can be added. Existence Precedes Essence. This is very similar to the database concept of "view" where some subset of columns can be dealt with as if they were their own table/entity. These views can also add "computed" or "joined" columns thus supporting the philosophy that there is a fine (or non-existent) line between "behavior" and "data". They are all just properties, and it is inside the black box that the details exist of what was required to get or set that property (using Java-speak).
A difference between this object view idea and normal interfaces is that the interface declaring a method has to be defined before the class that implements it is defined, which in turn is before an object (based on that class definition) is created. With a view, an object could be "cast" to some defined-after-object-creation interface, as long as that object had the methods/properties that the view expected. Otherwise, potentially a cast-exception could be thrown (or, it could just let you work with any methods that *did* match). Of course, with Existential Programming, all "interfaces" and "classes" are the same as "views". There is no single "real" version of an object/entity with other versions just "views". This is different from database views where down deep there IS a "real" table (or several if the view was really a "join"). With Existential Programming's strategy of using EAV structure for persistence, all entity definitions / classes / interfaces are "views".
- "quacks-like-a-duck" typing (vs strong or weak typing)
- duct taping (which is what all weakly typed programming is in the opinion of many)
External Abstraction ®☺
A related idea I've had is that of the "external abstraction" where, instead of merely views, whole hierarchies of super-classes, subclasses, and interfaces are defined after the fact. ["Internal" abstractions would be those normal OO abstractions where the super-classes and interfaces supported by an object were those that were defined when the object instance was created (i.e. its essence).]
The point of "external" abstractions is that someone "external" (i.e. other than the ontology creator and object instantiator) can decide (after an object has been created) that some aspect of the object is overly specific to one point of view and needs to support other points of view. For example, the "name" attribute of "person" classes is often broken into first name, middle name, last name which is very culturally biased. So, support needs to be added for other cultures where "last name" is not synonymous with "family name", and there is no "Christian name", and not everyone has three names. Is Dances with Wolves' middle name "with"?
Normally, if one were refactoring that class hierarchy to better handle names, one would convert the 3 string fields representing the name into a reference to a new Name interface and define classes that implement various permutations of name. The interface could have methods like getFamilyName() which determined the value based on the particular cultural variant name class instantiated. This is all nice when one can do this beforehand such that is it all compiled and ready for the code to use before the person object is instantiated. With External Abstraction, the developer can do this refactoring and apply it to an entity that has already been created and exists as an object or persisted to some data store. By adding in a newly created " mixin" interface/class to the preexisting object, the 3 name fields can be reinterpreted and mapped to other ontologies supporting other cultures.
Monday, July 17, 2006
Mathematically, a pure quantum state is typically represented by a vector in a Hilbert space. In physics, bra-ket notation is often used to denote such vectors. Linear combinations (superpositions) of vectors can describe interference phenomena. Mixed quantum states are described by density matrices."
Saturday, June 24, 2006
(click to enlarge)
From, seeing CYC's concepts of #$is-a versus #$genls reminds me of a discussion I had back in 2002 with the Protege 2000 folks at Stanford who produced a Wine ontology, where I wanted to have no distinction between classes and instances because I wanted a hierarchy like wine->reds->shiraz->Rosemont->vintage94->bottle#123. I.E. something considered a leaf on the tree might later be a node with children itself. Protege would only allow variables to take on values that were "instances" and I wanted to put "chardonnay" (a subclass) as the value of a "wine variety" property. SO, is there no difference between classes and object, or should the "value" of an attribute be able to contain a "class" reference??
From, seeing the Semantic Web's layer cake, I see that my ideas about recording "says who?" and "how reliable are you?" seem similar to the "trust layer". [Ed. note 11-23-07: like maybe you read this stuff years ago and it was the subliminal seed of this "says who" epiphany?]
 Tutorial on Ontological Engineering: Part 2: Ontology Development, Tools and Languages, Riichiro Mizoguchi, 2004
[*** Get the PDF here ***]
 ibid, Page 14, Fig. 2.
 ibid, Page 15, Section 3.1
 ibid, Page 23
Saturday, June 17, 2006
In another section, Leibniz's law of identity (which says that A is the same "thing" as B if all attributes of A are equal to their corresponding attributes in B), relates to my epiphany #5. But which set of properties are necessary/sufficient to claim a match? It depends on the ontology. Consider "cross temporal identity"...the river of today vs the river of yesterday..."molecules" vs "water". For people, the properties are often not used to identify them, but instead a "continuity of memory" connects yesterday's YOU vs today's YOU.
In another section, the difference between "types" and "tokens" are discussed. Type is an analog of "class". Token is an analog of "object". Type-identical is an analog of "instanceOf". Token-identical is an analog of "address-of(A) == address-of(B)".
 The Philosopher's Toolkit, Julian Baggini and Peter S. Fosl, Blackwell Publishers, 1st Ed., 2003, ISBN: 0631228748
 ibid, section 4.4
 ibid, section 3.6
 ibid, section 4.17
Wednesday, June 7, 2006
A strategy at the heart of an existential programming language could be to reduce entities to their most atomic level: semantic relations between an entity and a single attribute. Use "identity" algorithms to reconstitute these atoms into "things".
A new language that did this and integrated multiple sources of data (OO data, E/R relational data, semantic networks, web-search-results) could create a single seamless framework and data continuum.
[Ed. Note: as found 10/29/07, others have had similar ideas.]
However, since errors are made by the automated contact address parsing algorithms that try to figure out which customer is associated with which obligor, multiple customers can be associated with a single obligor. Hence, customers and obligors have a many-to-many relationship, and therefore, customers are many-to-many within themselves! Obligors are many-to-many within themselves! Customers not only have duplicates for the same person, they don't always represent a definite person or even set of definite people. They are vague and refer to parts of multiple people. Customers are effectively anything with a customer ID! Very existential.
A particular obligor (which, again, should be a particular customer in a particular location) was linked with three customers: JoeBlow, JaneBlow, a-customer-with-Jane's-name-and-Joe's-SSN! To make things worse, the attempt to clean up customers by defining them as a role of a "legal entity" didn't work in this case because the "customer" was really a married-household which was not a "legal entity" because it doesn't have its own tax id! Even worse, the rationale that legal entities are those things that are separately liable for money demands ignores the fact that both parties in a married household are liable (but even then differing on a state by state basis). Whew!
- OOP models assume that the "object pointer" (or object "handle") is a global unique identifier (GUID) for the "thing" represented by that object instance of that class.
- E/R models assume that there is either an opaque key (ala sequence numbers) or some set of attributes whose combined values form a GUID for the "thing" represented by that row of that table.
- S/N models assume that there is some explicit or internal key associated with each "entity".
The Class hierarchy defined in an OO program represents a model of entities, their attributes, and their relationships with other entities; i.e. an Ontology. Unlike modern semantic network approaches, where it is clear that a multiplicity of ontologies must be recognized and mediated between, OO Classes implicitly assume that they are "the only model", "the correct model", "the universal model". Some assumptions of OO, as normally practiced, are...
- Only a single ontology is supported. OOP needs a way of mixing Class hierarchies where each is a different perspective on the same "thing(s)".
- No model exists for describing the author of the ontology. It is potentially implied by its [Java] package name (when that concept applies), but as far as other attributes of the author, there is no way to represent the "reliability" of the author, or of this particular model, or of a particular set of data values associated with this model.
- No model of whether particular values of Object attributes are "true", "up to date", "not vague", "not fuzzy" (i.e. clusters of possible points with probabilities for each point).
- No concept of object instances overlapping; each object either exists or not; objects don't "partially overlap" each other; objects exist in a single place in a single "copy". In other words, OOP doesn't distinguish between "a thing" and some number of (potentially imprecise) "representations" of that thing.
- The Class hierarchy is assumed to be the only way to classify/divide the world into "things" (anyway, at least the "things" that those classes model).
- An instance of Class X is assumed to be a member of the set of all Xs in the world. I.E. OOP doesn't have a way to say, I've created an object instance, but whether it is a member of the class of all X is not tied to whether it was created as an instance of Class X at birth. OOP doesn't support an agnostic attitude towards class/type membership. In still other words, Essence precedes Existence!
- The values of all entity attributes (aka an object instance) are assumed to be available in a single contiguous location. I.E. OOP can't normally handle attribute values being spread all over creation (as would be the case for data mined about someone via web page searches). OOP can't normally handle taking widely different amounts of time to retrieve different attributes (as would be the case in data mining operations).
Level I - Model Mapping
- Make it easy to map Object-Oriented models to Entity-Relationship models to Semantic-Network models. I.E. implement OO persistence layer in the style of the EAV approach to semantic network databases. Implement auto-translation of data in traditional E/R tables into EAV records. Implement auto loading of data into OO model from arbitrary EAV tuples (and therefore arbitrary relational tables). In other words, automated persistence with automatic data mapping.
- Make it easy to accept ontologies and data from multiple sources; i.e. not just relational database. Example data sources could be: Web searches, Enterprise Silo systems, etc. In other words, build common adapters and mediators to broaden the reach of the "language" beyond structured local databases.
- "Consider the source". Make it easy to associate fuzzy logic factors to data-assertions and ontology-assertions of all granularities, based on the source of the data, the ontology, and even the assertions themselves. Examples are: for any given attribute value, "say's who?", "said when", "how reliable is this source?", "how reliable is this source for this attribute?", "who says that this attribute even applies to this class of thing", "how reliable is the source about the ontology definitions?". I want to be able to encode: "Sam is 89% trustworthy about colors", "Joe lies about AGEs", "Harry is 100% reliable when he says that Joe lies about AGEs", etc.
- Make it easy to handle attribute values that are themselves fuzzy. I.E. Probabilistic attribute values, conflicting values, cluster values, vague values, time varying values, outdated values, missing values, values whose availability is defined by some set of limits on the effort expended in finding the value (e.g. find all values of phone for joe blow that can be found within 10 seconds real time).
Tuesday, June 6, 2006
- Identity() as a separate model-definable function rather than using a single "key" in the form of an object pointer or reference. It would define whether multiple "things" are the "same thing".
- Equals() is different than Identity() because objects being equal is not the same thing as "the thing this object represents" is the same as "the thing that object represents".
- Determining the membership of "object 123" in the "set of all instances of class X" could/should be via an explicit list (along with "says who?", "as of when?", etc) rather than an intrinsic property of that object.
- Class definitions are in the mind of the "viewer" and can be applied to any object. Therefore, one should be able to use a mixture of many ontologies.
- Attributes of objects should be stored independently so that they are available to all "views", "classes", "entity types", EAV tuples, etc, etc.
Monday, June 5, 2006
It began with contemplating how Object-oriented modeling, and Entity-relationship modeling, and Semantic Network modeling are all isomorphisms of each other. Next I realized that O/O and E/R models are way too rigid because they expect a single "correct" model to work, whereas Semantic modelers pretty much know it is futile to expect everyone to use a single ontology! So, where would it take us to explore doing O/O and database development with that in mind? Next I had the intuition that Philosophy (with a capital P) probably had something to say about this topic and so I started reading Philosophy 101 books to learn at age 50 what I never took in college. It quickly became obvious that Philosophy has SO MUCH to say about these topics that it is criminal how little explicit reference to it there is in the software engineering literature.
- When mapping Object Oriented classes to semantic networks I realized that CLASSES/SUBCLASSES etc were the same as sets of semantic-relationship-triples (Entity-Attribute-Value aka EAV records) and therefore a class hierarchy formed an ontology (as used in the semantic network/web/etc world). AHA! It is futile to get everyone to agree upon ONE ontology (from my experience), SO, that is why it is a false assumption of O/O that there can/should be a single Class hierarchy. But, all O/O languages fundamentally assume this which is why they are hard to map to relational databases. Databases explicitly provide for multiple "views" of data. And in Enterprise settings, where there are often multiple models (from different stovepipe systems) of the same basic data, this causes even more of a mismatch with the single object model.
- Mapping O/O Class hierarchies to DB E/R models to Semantic Networks brings up questions about the meaning of Identity (with a capital I) and Essential vs Accidental properties. AHA! This sounds like Philosophy (which had I not started reading about before transcribing these notes into a blog, I would have not known terms like Essential and Accidental and Identity with a capital I to even use them here), SO, it would be worth learning Philosophy to improve my Software Engineering and Computer Science skills.
- Web pages can be thought of as a database whose data model/ontology is implied. Data mining can be done on it where the URL and the "time of last update" are added to each EAV tuple extracted from the page to extend a normal EAV "fact" with a "says who?" dimension and a temporal dimension to the database. In order to really capture all the nuances of the data mined from the web a standard data model ala O/O or E/R models have to also add some model of:
- different values at different points of time
- not only "say's who?" but "say's how?" i.e. which ontology is being used implicitly or explicitly
- only some attributes of a "thing" are being defined on any given URL
- Equality test should return a decimal probability ( 0..1) rather than a true/false value
- Find/Search operations should allow specification of thresholds to filter results
- Property "getters" become the same as "find" operations
- The result of a get/find is a set of values, each includes a source-of-record & time/space region i.e. say's who?, when and where was this true?
- Property "setters" should accept parameters for source-of-record-spec, time/space region, data freshness, as well as probability factor, or other means of specifying cluster values, vague values, etc.
- Multiple levels of granularity with regard to setting probability of truth values for entire source-of-record as well as for individual "fact"
- x is a part of y
- x is touching y
- x and y are in the set S
- Identity Criteria
- Required as Essential
- known as possible (but optional)
- unanticipated/unknown (but a value was found)
- unanticipated and not found (i.e. not conceived of)
- parts of A == parts of B but A<>B
- overlapping things like jigsaw puzzle pieces vs the objects in the completed puzzle picture
- a defacto Customer record that does not equal a "person" because the name belonged to one person but the SSN belonged to another. On the other hand, if the "customer" can really be "a married household" but the system can't handle that, then this customer record is not overlapping people, it is just incomplete. On the other other hand, how do the customer records for the husband and wife jive with the "household"?
- "which ontology is this based on?", (i.e. "whose definitions are we using?")
- "says who?", (source of the data)
- "and when was it said?", (date source was queried)
- "over what period of time was it green?" (because values change over time)
Wednesday, April 19, 2006
"Common sense is the collection of prejudices acquired by age eighteen."
"The man who has no tincture of philosophy goes through life imprisoned in the prejudices derived from common sense, from the habitual beliefs of his age or his nation, and from convictions which have grown up in his mind without the co-operation or consent of his deliberate reason."-Bertrand Russell
"unencumbered by the thought process"
- motto of radio show Car Talk
This is a rant page, creating a location I can point to, expounding my claim that there is no such thing as common sense, and if you think that there is then you haven't gotten out into the world enough. The only reason common sense seems common is that you've only dealt with people very much like you. There are other web pages that take a similar stance.
"To determine the value of philosophy, we must first free our minds from the prejudices of what are wrongly called 'practical' men."-Bertrand Russell
A secondary belief of mine is that, real or not, so-called common sense should not be the answer to the question "Why?" or "How?". It is an answer that is a get-out-of-jail-free card for non-thinkers. It allows them to make decisions Unencumbered By The Thought Process. I've always thought this (and it doesn't matter that I might have Asperger's... LOL), but I was bolstered in my belief when I found out that most Philosophers think so little of it that they have a name for it (which isn't a compliment): Naive Realism.
See/Hear the "Making Decisions" show of Philosophy Talk where the guest talks about the documented value of skeptics & contrarians in group settings, not because they are right or wrong, but because they free up people to go against group think who otherwise wouldn't speak up. Better for decisions even if more of a pain in the butt for those going thru the process.
Episode 12 (Probability and Modern Science) of the TTC Video Series Mathematics, Philosophy, and the 'Real World' gives a couple of examples of naive intuition as a bad thing…
- A study from 1980 is mentioned where a number of medical Drs were asked to translate the word "likely" (as in "likely to have a disease") into a percentage (as in "percent chance of having the disease"). Amazingly, the answers ranged from 20% to 95%
- An example is given of people's intuition about statistics being very wrong where a jury thinks there is a good chance that a witness is correct when he says he saw a blue car at night. FACTS: (1) witness was tested as being 80% correct when reporting car color at night, AND, (2) there are 15% blue cars and 85% green cars in population. In fact the probability that he was wrong is over half!
Because of so many green cars in total population, the witness will wrongly identify more green cars as blue (17% of total population) [i.e. 20% wrong of 85% green; 85x.2=17] than correctly identify blue cars as blue (12% of total population) [i.e. 80% right of 15% blue; 15x.8=12].
Thursday, April 13, 2006
Souls as property containers
When medieval alchemists distilled liquids into their essences they called them spirits because they were the "soul" of the grape, herb, flower, etc. To insure removing all of the accidental properties, they distilled things 5 times to produce the quintessential spirit. Distilled spirits in the alcohol sense often have names that reflect the notion that they have a life essence captured in them. Whiskey and Aquavit are both names that translate into "water of life" in their original languages. In the movie Perfume, a villain repeatedly attempts to distill the essence of pretty women using the same techniques as distilling flowers into perfume. [Spoiler Alert: flowers don't survive the process...]
When the Well of Souls runs out of RAM