Thursday, March 4, 2010

Model Entities, not just their parts

One of the oldest puzzles in Philosophy is the paradox of how something can change and yet still be considered the same thing. After all, if “same” is defined as “identical; not different; unchanged”, then how can it “change”?  On the other hand, even if I lose that hand (pun intended), I am still the same me. In chapter 5 of Peter Cave’s new book, “this sentence is false”[1], there is a collection of example paradoxes that illustrate how our intuitions about “sameness” are inconsistent.  Some paradoxes involve entities (or properties) whose definition is "vague", as in “How many cows make up a herd?” or “At what weight does adding a pound change you into being ‘fat’?”  However, here I will be focusing on the change paradoxes involving things with a well defined set of parts. They illustrate the problem with defining something as merely the collection of its parts (unless of course “it” is truly only a collection, and not an entity in its own right).
George Washington's axe
Harry: I have here the very axe with which George Washington chopped down the cherry tree. It’s been used by my family for generations.
Sally: But this says “Made in China”!
Harry:  Well, over the years, the handle was replaced each time it wore out. Oh, and the blade’s been replaced a couple of times too.
Sally: But those are the only two parts…that’s not the same axe at all then!!

Ship of Theseus
(original paradox by Plutarch)
Theseus had a ship whose parts were replaced over time such that, at a certain point, no original pieces were left.
How can the latter ship be said to be the same ship as the original if they have no parts in common?

(sequel paradox by Hobbes)
Suppose that those old parts were stockpiled as they were being replaced, and later they were reassembled to make a ship.
NOW, which ship is the same as the original ship; the one with the original parts, or, the one with the replacement parts?
At the bottom of these paradoxes is the question of whether a thing-made-up-of-parts is the same as the collection of all its parts.  I.E. can everything that can be said of the whole thing be equally said of the collection of all its parts, and vice-versa? For 2500 years, western philosophers including Socrates, Plato, and Aristotle, right through to the 21st century, have been debating this question, generating whole libraries of book and papers.  In fact, Mereology is an entire field of study that is just about the relationship between parts and their respective wholes.

What does it mean to be an individual?

As discussed (at great length) in the book Parts[2], there is a whole spectrum of things in between “individuals” and “groups”, and they are referred to in everyday language by singular terms (e.g. person), plural terms (e.g. feet), and some words that could mean either (e.g. hair). There are individuals (say, a car), parts of individuals that are themselves individuals (say, a wheel), parts of individuals that are NOT themselves individuals (say, the paint), collections that do not form an individual (say, “the wheels of that car”), collections that DO constitute an individual (say, the car parts that comprise the engine where the engine is itself an individual), and so on, and so on.

A key to distinguishing whether a thing being referred to is truly a thing in its own right (and not just a plural reference masquerading as a single thing) is what sorts of things can be said about it.  Orchestra is an ambiguous term because it can be used as a singular or a plural as in “the orchestra IS playing” vs the equally grammatical “the orchestra ARE playing”.  If it is considered an individual then we can say things about its creation, its history, etc, whereas the plural use simply denotes a collection of players where not much can be said about “it” apart from the count of players, their average age, etc.  Relational Database programmers will recognize individuals as those that get their own record in some entity table, and plurals/sets/collections as equivalent to the result set from some arbitrary query.  SQL aggregate functions (like count, average, minimum, maximum, etc) are the only things that can be said about the result set as a whole. Result sets do not get primary keys because they are not a “thing”, whereas real individuals do (or should!) get their own personal identity key.  Even when an arbitrary query is made to look like an entity by defining a “view”, it is not always possible to perform updates against the search results because the view is not a real entity.

What does it mean to be the same?

A big problem is that there are many different flavors of “sameness” when we say that A is the same as B. Right off the bat there is a difference between Qualitative identity versus Numerical identity. Two things are qualitatively identical if they are duplicates, like a pair of dice. Two things are numerically identical if they are one and the same thing, like the Morning Star and the Evening Star (both of which are, in fact, really the planet Venus).  They are “numerically” identical in that when counting things they only count as one thing.  Another complication is that there is a difference between identity (right this second) versus Identity over time which deals with the whole question of how something can be different at two different times and yet still be considered the same thing.  For example, you are still considered numerically identical to the you of your youth even though you have clearly changed…although this gets into the even more involved topic of Personal Identity [which may or may not apply to an axe ;-) ] Traditionally, if x was identical to y, and y was identical to z, then x had to be identical to z. Relative Identity has been proposed such that this need not be true, thus allowing both the morning and evening stars to be identical to Venus but not to each other.

When specifically asking whether the paradoxical ships and axes are numerically identical, as Peter Cave points out, two of our usual criteria for being “one and the same thing” are in conflict.  They are (a) being composed of largely the same set of parts, and (b) being appropriately continuous through some region of space and time.  The continually refurbished ship meets (b) but the reassembled original parts meet (a).

In traditional logic, as formulated in Leibniz’s Law, two things are the “same” only if everything that can be said about one thing can also be said about the other. In other words, all the properties of each object/entity need to be equal if they are one and the same.  By this token, the two axes and the various ships are not the same.  Of course, this means that ANY change to ANY property causes the new thing to not be “the same” as the old. To avoid this, others have said that only essential and not accidental properties should be compared.  This means that the definitions of “ship” and “axe” should distinguish between those properties that must remain the same throughout the lifetime of the object versus those properties that may change over time.
Java Programmers can relate to the philosophical meanings of “essential” and “accidental” in the following way. [To keep this sidebar simple, think of “entity beans” where only one bean/object/instance is allowed to represent a particular real world entity (e.g. {name=Joe Blow,ssn=123456789})…i.e. there are never multiple object instances in RAM simultaneously representing Joe.]    Class definitions could have “essential” properties implemented via constants (i.e. final instance variables initialized in the constructor ala the Immutable design pattern). And, “accidental” properties are implemented via normal instance members.

The essential properties must be final because if their values were different then they would have to be a different individual.  E.G. If an instance of class Person has a constant DNA_Fingerprint_Code with value of 1234567890, it would not be correct to change that value on that same object because a person’s DNA both defines them and never changes; i.e. “essential” in the Philosophy sense. The correct procedure would be to create a new instance of Person because it must truly be a different person if it has different DNA. [Of course, this brings up the whole separate topic of the difference between changing a property’s value because it has a truly new value versus merely correcting a mistaken value.  Normally, computer software has not been designed to make this distinction even though it would make some systems much more robust, and able to reflect reality better if they did.]

The putative method IsTheSame(Object o) would compare either all properties, or only essential properties, of this and o depending on your philosophy.  [This also brings up the whole separate topic of the Java equals() method, and the many potential meanings of “equals” apparent when thinking Philosophically.]
More than the sum of its parts

So, the particular individual parts of a thing need not all be “essential” properties of that thing, and hence they may change without affecting that thing’s identity.  (You are still you even if you lose a leg or lung, but not a head). Well then, what are some potential essential properties of an individual thing?  Many advocate taking a look at Aristotle’s “four causes” of a thing, where he defined “cause” as anything that was involved in the creation of that thing.  His two main varieties of causes were intrinsic, for causes that are “in the object”, and extrinsic, for those that are not.  The two sub-varieties of intrinsic causes were material cause (the material the thing consists of) and formal cause (the thing’s form [OOP programmers think Class]).  The two sub-varieties of extrinsic causes were efficient cause (the “who” or “what” that made it happen, or “how”) and final cause (the goal, or purpose, or “why”).

By analyzing the paradoxes using Aristotle’s causes it can be argued that the Ship of Theseus is the same ship, because the form does not change, even though the material used to construct it may vary with time. Also, the Ship of Theseus would have the same purpose, that is, transporting Theseus, even though its material would change with time.  The builders and tools used may, or may not, have been the same, therefore, depending on how important the efficient cause is to you, it would make more or less of a difference.  So, giving priority in definitions to some causes over other causes can answer riddles like these.

Further more, analyzing the “causes” of a thing’s creation, forces one to agree on when a thing actually comes into and out of existence, how to tell it apart from other similar things, how to count them, how to recognize it again in the future, and so forth.  Circularly, Causes also provide justifications for those agreements.  These criteria for identity help define the sortal definition of the thing (i.e. knowing how to sort these sorts of things from other sorts of things, and being able to count them on the way).

Case Studies: BigBank “Facilities” and Customers

I worked on some projects at "BigBank" (a recently defunct Top-5-in-the-USA bank) where these Philosophy-inspired techniques would have really helped.  Here are two case studies that illustrate the problems of modeling the parts but not the wholes.

In the first case study, BigBank (in order to meet new international banking standards) needed to retrofit its computer systems to record and report on their track record in guessing whether loans would be paid off eventually.  Each guess took the form of a “default grade” for a package of loans, each known as a “facility”.

A major problem was that their various systems did not agree on the basic definition of “facility”.  This was because the definition of a “facility” went so without saying that no one actually said (in a rigorous way) what it was.  Everyone interviewed knew intuitively what one was but couldn’t quite put it into words, and when pressed, it turned out that they all had different definitions from each other.  As a result, the various systems around the bank were built with different ontologies (i.e. models of the world).  A key problem was that many of BigBank’s systems assumed that Facilities were no more than the collection of their parts, and so only the parts were recorded with no standard place to say things about each Facility as a whole.  As a result, it came as a surprise to everyone that there had never been any agreement as to when which parts belonged to which wholes, nor even when any particular whole Facility came into or out of existence. Consequently, BigBank had several different “Facility ID”s, none of which agreed which each other, hence, no way to definitively report on the history of any particular Facility.
CASE STUDY:  At BigBank, credit grades are calculated for "facilities". A facility is a collection of "obligations" (i.e. loans, lines of credit) that are being considered together as a single deal and graded as a whole. The particular set of obligations grouped into each "facility" changes over time as individual obligations get paid off or expire. Plus, changed or not, the facilities are supposed to be re-graded from time to time.  Unfortunately, some key BigBank databases only had records for individual obligations. There was no Facility entity table.

So, for example, whenever a "facility" was (re)graded, in reality, only a set of obligation records were updated, all with the same single “facility-grade”. In fact, other than the loan officer's neurons, there was no record of which obligations had been associated with which "facility" over time.  So, when there was a new requirement to store for each facility all its grading documents, there was no place to put them. Even worse, since a Facility entity had never been formally defined, the analysis had never been done to make sure everyone had the same definition of a "facility" (which they didn't).
There was no agreement on what the thing being graded actually was! For some, each individual grading event was considered a "facility" (along with its own "facility ID") because "the grading sheet is what is graded".

A second case study (which I detailed back in 2006) involves BigBank's treatment of customer information. Some BigBank systems defined Customer entities and assigned a single ID for each one, but other systems gave the same person or corporation a different ID in each state and called them Obligors. Once again, some systems modeled only the wholes (i.e. customers) and other systems only modeled the parts (i.e. obligors). And once again, because the systems working at the parts level did not tie them together as a whole, there was disagreement about which obligors belonged to which customers.  It had become so bad that the data model had to be changed to allow multiple customers to be tied to a single obligor, lest conflicting data feeds go unprocessed. It was like having Person records and BodyPart records, but needing to kludge in the ability to have multiple people associated with the same particular foot!

[1] chapter 5, this sentence is false, Peter Cave, 2009, Continuum, ISBN: 9781847062208
[2] Parts, Peter Simons, 1987, Oxford University Press
[3] Introducing Aristotle, Rupert Woodfin and Judy Groves, 2001


  1. I think, initially, that as long as there is continuity an entity is the same even if bits (teeth for example) are replaced. Which raises the question, if the Startrek teleporter existed would the same people arrive as were transmitted?

  2. Yes, transporter scenarios are common in the philosophical literature of "personal identity" which adds an extra bit of complexity to the same/not-same questions above (because a "person" is not necessarily the same as its human body).

    I highly recommend the movie "The Prestige" which seems to give an example of every what-if scenario in the discussion of "are these two people the same" which is the central question in personal identity.