Monday, June 5, 2006

The Original Epiphanies of Existential Programming

The items below are a summary of the several AHA! moments I had over the May/June 2006 time frame. [see my std disclaimers]
It began with contemplating how Object-oriented modeling, and Entity-relationship modeling, and Semantic Network modeling are all isomorphisms of each other. Next I realized that O/O and E/R models are way too rigid because they expect a single "correct" model to work, whereas Semantic modelers pretty much know it is futile to expect everyone to use a single ontology! So, where would it take us to explore doing O/O and database development with that in mind? Next I had the intuition that Philosophy (with a capital P) probably had something to say about this topic and so I started reading Philosophy 101 books to learn at age 50 what I never took in college. It quickly became obvious that Philosophy has SO MUCH to say about these topics that it is criminal how little explicit reference to it there is in the software engineering literature.

  1. When mapping Object Oriented classes to semantic networks I realized that CLASSES/SUBCLASSES etc were the same as sets of semantic-relationship-triples (Entity-Attribute-Value aka EAV records) and therefore a class hierarchy formed an ontology (as used in the semantic network/web/etc world). AHA! It is futile to get everyone to agree upon ONE ontology (from my experience), SO, that is why it is a false assumption of O/O that there can/should be a single Class hierarchy. But, all O/O languages fundamentally assume this which is why they are hard to map to relational databases. Databases explicitly provide for multiple "views" of data. And in Enterprise settings, where there are often multiple models (from different stovepipe systems) of the same basic data, this causes even more of a mismatch with the single object model.
  2. Mapping O/O Class hierarchies to DB E/R models to Semantic Networks brings up questions about the meaning of Identity (with a capital I) and Essential vs Accidental properties. AHA! This sounds like Philosophy (which had I not started reading about before transcribing these notes into a blog, I would have not known terms like Essential and Accidental and Identity with a capital I to even use them here), SO, it would be worth learning Philosophy to improve my Software Engineering and Computer Science skills.
  3. Having now worked with both Java and Javascript deeply enough to understand class versus prototype based languages (see my AJAX articles), I see that Java is like Plato's view of the world, and Javascript is more like Existentialism (where an object can be instantiated without saying what "type" it is).
  4. Web pages can be thought of as a database whose data model/ontology is implied. Data mining can be done on it where the URL and the "time of last update" are added to each EAV tuple extracted from the page to extend a normal EAV "fact" with a "says who?" dimension and a temporal dimension to the database. In order to really capture all the nuances of the data mined from the web a standard data model ala O/O or E/R models have to also add some model of:
    • completeness
    • accuracy
    • different values at different points of time
    • not only "say's who?" but "say's how?" i.e. which ontology is being used implicitly or explicitly
    • only some attributes of a "thing" are being defined on any given URL
  5. O/O languages could/should be extended to make it easier to work with arbitrary sets of semantic network relationships/tuples such that it could handle integration of various (E/R, Enterprise, web page, data mining) data models.
  6. Google, Homeland Security, Corporate data warehouses all would benefit from being able to work with "everything we know about X". This could be a good technique to integrate disparate data sources.
  7. O/O languages need to be more like Javascript in letting any set of attributes be associated with an object and "classes" are more like "roles" or interfaces that the VIEWER chooses instead of tightly coupling the attribute set to a predefined list. The VIEW chosen by the viewer/programmer can still be type-safe once chosen BUT it cant assume the source of data used the same "view".
  8. "View" (see above) includes all aspects of traditional classes PLUS parameters for deciding trustworthiness, deciding the "identity" of the thing that attributes are known about, and all other "unassumable" things. An O/O language could set defaults for these parameters to match the assumptions of traditional programming languages.
  9. Searching the web and trying to integrate the data is much like trying to integrate the data from disparate silo systems into a single enterprise data model or data warehouse. They both need to take into account where each data value came from, how accurate/reliable those sources are, and how their ontologies map to each other and accumulate attributes from different sources about the same entity.
  10. When dealing with the sort of non-precise, non-reliable values of object properties as found on the web, the following are needed as a part of the "ontology" defined to work with that data:
    • Equality test should return a decimal probability ( 0..1) rather than a true/false value
    • Find/Search operations should allow specification of thresholds to filter results
    • Property "getters" become the same as "find" operations
    • The result of a get/find is a set of values, each includes a source-of-record & time/space region i.e. say's who?, when and where was this true?
    • Property "setters" should accept parameters for source-of-record-spec, time/space region, data freshness, as well as probability factor, or other means of specifying cluster values, vague values, etc.
    • Multiple levels of granularity with regard to setting probability of truth values for entire source-of-record as well as for individual "fact"
  11. How to handle deciding what a thing is? What "level" of abstraction/reality is it on? E.G. an asteroid is a loose collection of pebbles, but that means that the parts of something don't always "touch" the thing. i.e. What is the real difference between the following:
    • x is a part of y
    • x is touching y
    • x and y are in the set S
  12. How are attribute values of null to be interpreted? What is the difference between "has definitely no value" and "dont know the value"? Attributes of X (according to some given ontology) are either:
    • Identity Criteria
    • Required as Essential
    • known as possible (but optional)
    • unanticipated/unknown (but a value was found)
    • unanticipated and not found (i.e. not conceived of)
  13. It is a big deal to understand the borderline between the set of "thing"s (aka entity, object) and the set of "value"s (e.g. 1,2,3,a,b,c,true,false,etc) especially when many OO languages represent them all with "objects".
  14. It is a big deal to handle the problem where ontologies mismatch each other with regard to "what is a thing" and "where does one thing end and another one begin". E.G.
    • parts of A == parts of B but A<>B
    • overlapping things like jigsaw puzzle pieces vs the objects in the completed puzzle picture
    • a defacto Customer record that does not equal a "person" because the name belonged to one person but the SSN belonged to another. On the other hand, if the "customer" can really be "a married household" but the system can't handle that, then this customer record is not overlapping people, it is just incomplete. On the other other hand, how do the customer records for the husband and wife jive with the "household"?
  15. There are attributes of an entity and there are "meta-attributes", e.g. an EAV tuple of an attribute could be (object123,color,green) [where "color" and "green" should be defined in the ontology in question.] Meta-attributes could be...
    • "which ontology is this based on?", (i.e. "whose definitions are we using?")
    • "says who?", (source of the data)
    • "and when was it said?", (date source was queried)
    • "over what period of time was it green?" (because values change over time)
  16. If objects can have arbitrary collections of attributes, and they are not any definite "thing", then how do you know what-is-a/when-to-create-a-new-instance-of-the "thing"?? And where does one "thing" end and the next one begin?
  17. Intuitively, people agree on when one person begins and another person ends even if we cant define how/why. This is not true of abstract concepts. Modeling should find the easy to recognize real-world entities and use them in preference to concepts (which are often roles anyway like customer or prospect or employee).
  18. People "know" other people (i.e. recognize them later) via shared "events" which both can verify to each other. [Just like the shared PIN# secret between you and the bank. And now increasingly asking all sorts of personal questions like whats your favorite movie?]




No comments:

Post a Comment