Tuesday, August 10, 2010

Not All Properties Are Created Equal (Part I: Essentials)

Just a few years ago, I started reading “Philosophy 101” type books and I was immediately surprised at how relevant the ideas were to day to day software development.  So relevant, in fact, that an even bigger surprise is that these ideas are not part of the basic computer science curriculum, nor mentioned in technical books & magazines (with the possible exception of graduate level artificial intelligence). The following idea was the first I encountered that made it clear to me that programmers need to know what philosophers already know. It was also my first clue that philosophy has a whole body of knowledge about developing data & object models that computer science books leave up to intuition.

Did you know that 2500 years ago, philosophers like Plato and Aristotle were doing Object-Oriented Analysis and Entity-Relationship Modeling? More surprisingly, they were already more sophisticated than software developers are now!   Why?  Well, for one thing, they already understood that the properties of an object are not all created equal.  Whereas programmers today basically think that a property is a property, even ancient philosophers understood that there are several different categories of properties. And most importantly, only some properties define WHAT something is; the other properties merely describe HOW it is. Embracing this distinction will change the way you carve up the world into classes and relationships, and which attributes you assign to which entities.  I have come to feel that creating data models without this distinction is like wearing glasses so out of focus that distinct objects blur together into indistinguishable blobs. 

The properties that something must have, in order to be what it is, Philosophy calls Essential properties. Those properties that something may optionally have are called Accidental properties. For example, for you to be a human, you must have human DNA; it is an Essential property. Being named Smith, however, is an Accidental property because you would still be human even if you had a different name or even no name.  So, our everyday meaning for “essential” is different than the philosophical meaning. If a client says that the address is an essential part of the customer data, that doesn’t mean that it is Essential in the philosophical sense. In fact, it is not Essential because whatever a “customer” is, it will still be that same kind of thing even if its address changes.  The distinction between essential and accidental properties is even embedded into some human languages like Irish where there are two different “is” verbs; “Tá” is used for accidentals like “He is hungry”, but the verb they call The Copula is used for essentials like “He is Hungarian”.

A hallmark of Essential properties is that they are unchanging. An object’s Essential properties can not change without that object becoming a different kind of thing. There is an ancient philosophical paradox of how something can change and yet remain the same. You are different than you were as a child, and yet you are still the same you. As Heraclitus said, "You can’t step into the same river twice because the water is always different."  The solution the Greeks came up with was that Accidental properties may change but Essential properties must remain the same (otherwise, a metamorphosis has occurred!).  This philosophy is known as Essentialism.

Socrates taught that everything has an “essential nature” that makes it the kind of thing that it is. His pupil, Plato, taught that these essences are manifest in ideal “Forms” of which all objects are mere copies. Plato’s pupil, Aristotle, taught that Essential properties were those that defined a Form, and Accidental properties were those which distinguished one individual object from another of the same kind.  Our object-oriented programming notion of Class is analogous to Plato’s Forms. Like a Class, a Form is unchanging and it pre-exists any objects which instantiate it. Naturally, Entity tables are the database equivalent of Forms with their records being the objects.

So, what is using this idea supposed to buy me? I think a case can be made for at least the following:
  1. Better definitions of entities, classes, and relationships result because it forces you to weed out all the non-essentials (pun intended). By striving to understand entity essentials, and not just normalizing data tuples, you will be more likely to accurately model the world.
  2. Better specifications result because there will now be a place to put all those unwritten (even unspoken) assumptions about the nature of the problem domain. Ironically, when gathering requirements and doing analysis, the essential properties of things are often given short shrift  because they mostly don't get stored in databases, because.…that's right, they don't change!  It is the changeable accidental properties that get stored, with the unchanging essential properties getting buried as hardwired assumptions in the programming logic.
  3. Better system interoperability results because universal essential data is separated from local accidental data. The integration of data between a customer system and a patient system and an employee system would be much easier if they had modeled the essential entity, which is Person. When customer and patient and employee are recognized as merely accidental roles of a Person, there is an immediate common entity type to synchronize on rather than widely divergent data tables.
  4. Better identity systems result because life-long identifiers will no longer be confused with changeable properties like names, addresses, and phone numbers.
Examples of using Essentialism
  • Imagine a Person table where we already understand that using the Name column as the primary key is a bad idea, simply because names are not unique.  Some are tempted to create a compound key using name plus some other column(s) like address, phone, etc.  With an Essentialism perspective it is clear that, while the compound key may be unique for the moment, it is composed of accidental properties and hence can change at any time!  Stored references to previous keys will fail. Current keys won’t match future keys.



    We want keys using essential attributes that remains fixed for the life-time of the Person. A unique, fixed, objective, essential attribute like the person’s complete DNA sequence would do the trick! However, a government issued tax ID like SSN can be a practical substitute, plus it’s better than a proprietary “customer ID” because it is something that is effectively universal, and therefore can be used to integrate databases from multiple sources.

    Lest you think this example is contrived, I witnessed a Top-5-in-the-USA bank design a customer identity system using a key composed of accidental properties rather than SSN because “we have not traditionally collected SSNs” (despite government “know your customer” security laws requiring it!)  They had to continually fix their ever-changing data because they only focused on having a “unique key”.

  • In tutorials for any technology that uses entities, the example of a “customer” entity is almost cliche. The design of databases, XML schemas, UML diagrams, SOAP messages, Java Classes, etc, etc, have all used it.  But when we ponder the essential nature of a “customer”, there is an immediate problem…



    Philosophers have devised many systems for organizing “what exists”, and one of the first of their “20 questions” is: Is its essence physical or abstract? Customers can either be a person (a physical thing that exists in time and space), or a corporation (an abstract thing that doesn't exist in space), so, which is it?  This is our clue that it isn’t an entity at all.  With some thought (and benefit of reading more philosophy), it becomes clear that “customer” is really just a role that different entities can play.  It is part of the relationship between that entity acting as customer, client, buyer and that entity acting as vendor, seller, provider.
Whatever interpretation you would give about the essential-ness of this or that property, the main point is that it is worth knowing that there IS a distinction. And more generally, Philosophy has some ideas to which we programmers need to be exposed.

4 comments:

  1. Great post Bruce. One thing to consider... your DNA can change over time. It happens rarely, and the changes are small, yet it's worth considering that DNA is not always fixed for the life-time of the person.

    ReplyDelete
  2. Right. As I wrote in the postscript at
    http://existentialprogramming.blogspot.com/2010/03/model-entities-not-just-their-parts.html

    "I almost said that a person's DNA was unique until I remembered twins, triplets,etc. So, I started to add a reference to DNA+epigenetics in order to be able to say "unique", except that the "never changes" bit would no longer be true.

    Interesting "paradox" that epigenetics is needed to uniquely define/identify someone but it is ever-changing. This gets at the heart of the notion of essential versus accidental...it is unique to something but not essential to that same something."

    ReplyDelete
  3. Note that SSNs are not unique and are reused after the death of those who possess them, and Bruce Schneier already pointed to a number of attacks/scams, mostly related to faking credit history and/or other personality attributes with duplicate SSNs :)

    I think that the proper DB should store a history of an "essential individual" and changes to its accidental over time

    ReplyDelete
  4. While SSNs aren't perfect, they illustrate the type of property that stays unchanged over the lifetime of the entity (not to mention it has the bonus of being verifiable by the Federal Government: http://www.ssa.gov/employer/ssnv.htm ).

    ReplyDelete