Tuesday, September 21, 2010

Fuzzy Unit Testing, Performance Unit Testing

Because of the topic of testing coming up in my recent post "Is Morality Eating Your Own Dogfood?", it made me finally publish the following notebook entry from 2007...

In reading Philosophy 101, about Truth with a capital "T", and the non-traditional logics that use new notions of truth, we of course arrive at Fuzzy Logic with its departure from simple binary true/false values, and embrace of an arbitrarily wide range of values in between.

Contemplating this gave me a small AHA moment: Unit Testing is an area where there is an implicit assumption that "Test Passes" has either a true or false value.  How about Fuzzy Unit Testing where there is some numeric value in the 0...1 range which reports a degree of pass/fail-ness? i.e. a percentage pass/fail for each test.  For example, testing algorithms that predict something could be given a percentage pass/fail based on how well the prediction matched the actual value.  Stock market predictions, bank customer credit default prediction, etc come to mind.  This sort of testing of predictions about future defaults (i.e. credit grades) is just the sort of thing that the BASEL II accords are forcing banks to start doing.

Another great idea (if I do say so myself) that I had a few years ago was the notion that there is extra meta-data that could/should be gathered as a part of running unit test suites; specifically, the performance characteristics of each test run.  The fact that a test still passes, but is 10 times slower than the previous test run, is a very important piece of information that we don't usually get.  Archiving and reporting on this meta-data about each test run can give very interesting metrics on how the code changes are improving/degrading performance on various application features/behavior over time.  I can now see that this comparative performance data would be a form of fuzzy testing.





Tuesday, August 10, 2010

Not All Properties Are Created Equal (Part I: Essentials)

Just a few years ago, I started reading “Philosophy 101” type books and I was immediately surprised at how relevant the ideas were to day to day software development.  So relevant, in fact, that an even bigger surprise is that these ideas are not part of the basic computer science curriculum, nor mentioned in technical books & magazines (with the possible exception of graduate level artificial intelligence). The following idea was the first I encountered that made it clear to me that programmers need to know what philosophers already know. It was also my first clue that philosophy has a whole body of knowledge about developing data & object models that computer science books leave up to intuition.

Did you know that 2500 years ago, philosophers like Plato and Aristotle were doing Object-Oriented Analysis and Entity-Relationship Modeling? More surprisingly, they were already more sophisticated than software developers are now!   Why?  Well, for one thing, they already understood that the properties of an object are not all created equal.  Whereas programmers today basically think that a property is a property, even ancient philosophers understood that there are several different categories of properties. And most importantly, only some properties define WHAT something is; the other properties merely describe HOW it is. Embracing this distinction will change the way you carve up the world into classes and relationships, and which attributes you assign to which entities.  I have come to feel that creating data models without this distinction is like wearing glasses so out of focus that distinct objects blur together into indistinguishable blobs. 

The properties that something must have, in order to be what it is, Philosophy calls Essential properties. Those properties that something may optionally have are called Accidental properties. For example, for you to be a human, you must have human DNA; it is an Essential property. Being named Smith, however, is an Accidental property because you would still be human even if you had a different name or even no name.  So, our everyday meaning for “essential” is different than the philosophical meaning. If a client says that the address is an essential part of the customer data, that doesn’t mean that it is Essential in the philosophical sense. In fact, it is not Essential because whatever a “customer” is, it will still be that same kind of thing even if its address changes.  The distinction between essential and accidental properties is even embedded into some human languages like Irish where there are two different “is” verbs; “Tá” is used for accidentals like “He is hungry”, but the verb they call The Copula is used for essentials like “He is Hungarian”.

A hallmark of Essential properties is that they are unchanging. An object’s Essential properties can not change without that object becoming a different kind of thing. There is an ancient philosophical paradox of how something can change and yet remain the same. You are different than you were as a child, and yet you are still the same you. As Heraclitus said, "You can’t step into the same river twice because the water is always different."  The solution the Greeks came up with was that Accidental properties may change but Essential properties must remain the same (otherwise, a metamorphosis has occurred!).  This philosophy is known as Essentialism.

Socrates taught that everything has an “essential nature” that makes it the kind of thing that it is. His pupil, Plato, taught that these essences are manifest in ideal “Forms” of which all objects are mere copies. Plato’s pupil, Aristotle, taught that Essential properties were those that defined a Form, and Accidental properties were those which distinguished one individual object from another of the same kind.  Our object-oriented programming notion of Class is analogous to Plato’s Forms. Like a Class, a Form is unchanging and it pre-exists any objects which instantiate it. Naturally, Entity tables are the database equivalent of Forms with their records being the objects.

So, what is using this idea supposed to buy me? I think a case can be made for at least the following:
  1. Better definitions of entities, classes, and relationships result because it forces you to weed out all the non-essentials (pun intended). By striving to understand entity essentials, and not just normalizing data tuples, you will be more likely to accurately model the world.
  2. Better specifications result because there will now be a place to put all those unwritten (even unspoken) assumptions about the nature of the problem domain. Ironically, when gathering requirements and doing analysis, the essential properties of things are often given short shrift  because they mostly don't get stored in databases, because.…that's right, they don't change!  It is the changeable accidental properties that get stored, with the unchanging essential properties getting buried as hardwired assumptions in the programming logic.
  3. Better system interoperability results because universal essential data is separated from local accidental data. The integration of data between a customer system and a patient system and an employee system would be much easier if they had modeled the essential entity, which is Person. When customer and patient and employee are recognized as merely accidental roles of a Person, there is an immediate common entity type to synchronize on rather than widely divergent data tables.
  4. Better identity systems result because life-long identifiers will no longer be confused with changeable properties like names, addresses, and phone numbers.
Examples of using Essentialism
  • Imagine a Person table where we already understand that using the Name column as the primary key is a bad idea, simply because names are not unique.  Some are tempted to create a compound key using name plus some other column(s) like address, phone, etc.  With an Essentialism perspective it is clear that, while the compound key may be unique for the moment, it is composed of accidental properties and hence can change at any time!  Stored references to previous keys will fail. Current keys won’t match future keys.



    We want keys using essential attributes that remains fixed for the life-time of the Person. A unique, fixed, objective, essential attribute like the person’s complete DNA sequence would do the trick! However, a government issued tax ID like SSN can be a practical substitute, plus it’s better than a proprietary “customer ID” because it is something that is effectively universal, and therefore can be used to integrate databases from multiple sources.

    Lest you think this example is contrived, I witnessed a Top-5-in-the-USA bank design a customer identity system using a key composed of accidental properties rather than SSN because “we have not traditionally collected SSNs” (despite government “know your customer” security laws requiring it!)  They had to continually fix their ever-changing data because they only focused on having a “unique key”.

  • In tutorials for any technology that uses entities, the example of a “customer” entity is almost cliche. The design of databases, XML schemas, UML diagrams, SOAP messages, Java Classes, etc, etc, have all used it.  But when we ponder the essential nature of a “customer”, there is an immediate problem…



    Philosophers have devised many systems for organizing “what exists”, and one of the first of their “20 questions” is: Is its essence physical or abstract? Customers can either be a person (a physical thing that exists in time and space), or a corporation (an abstract thing that doesn't exist in space), so, which is it?  This is our clue that it isn’t an entity at all.  With some thought (and benefit of reading more philosophy), it becomes clear that “customer” is really just a role that different entities can play.  It is part of the relationship between that entity acting as customer, client, buyer and that entity acting as vendor, seller, provider.
Whatever interpretation you would give about the essential-ness of this or that property, the main point is that it is worth knowing that there IS a distinction. And more generally, Philosophy has some ideas to which we programmers need to be exposed.

Sunday, August 8, 2010

Neural Nets, Vagueness, and Mob Behavior

In response to the following question on a philosophy discussion board, I replied with the short essay below and reproduce it here.
"It was then that it became apparent to me that these dilemmas – and indeed, many others – are manifestations of a more general problem that affects certain kinds of decision-making. They are all instances of the so-called ‘Sorites’ problem, or ‘the problem of the heap’. The problem is this: if you have a heap of pebbles, and you start removing pebbles one at a time, exactly at what point does the heap cease to be a heap?"
VAGUE CONCEPTS
This leads to the entire philosophy of "vagueness". i.e. are there yes/no questions that don't have a yes/no answer? Are some things like baldness vague in essence, or, is our knowledge merely incomplete? e.g. we don't know the exact number of hairs on your head, and/or, we don't know/agree on the exact number of hairs that constitutes the "bald" / "not bald" boundary?

NEURAL NETS
My personal conclusion is that there ARE many vague concepts that we have created that are tied to the way our brains learn patterns (and, as a side effect, how we put things into categories). In contrast to rational thought (i.e. being able to demonstrate logically step by step our conclusions), we "perceive" (ala Locke/Hume/Kant) many things without being able to really explain how we did it.

In Artificial Intelligence, there are "neural network" computer programs that simulate this brain-neuron style of learning. They are the programs that learn how to recognize all different variations of a hand-written letter "A" for example. They do not accumulate a list of shapes that are definitely (or are definitely not) an "A", but rather develop a "feel" for "A"-ness with very vague boundaries. They (like our brains) grade a letter as being more or less A-like. It turns out that this technique works much better than attempting to make rational true/false rules to decide. This is the situation that motivates "fuzzy logic" where instead of just true or false answers (encoded as 1 or 0), one can have any number in-between, e.g. 0.38742 (i.e. 38.7% likely to be true).

WISDOM OF THE CROWD?
Because each person has their own individually-trained "neural net" for a particular perception (e.g. baldness, redness, how many beans are in that jar?), we each come up with a different answer when asked about it. However, the answers do cluster (in a bell-curve-like fashion) around the correct answer for things like "how many beans".  This is what led Galton to originally think that there was "wisdom in the crowd".   This idea has been hailed as one of the inspirations for the new World Wide Web (aka Web 2.0). The old idea was that McDonalds should ask you if "you want fries with that?" to spur sales. The new Web 2.0 idea is that Amazon should ask you if you want this OTHER book based on what other people bought when they bought the book you are about to buy. I.E. the crowd of Amazon customers know what to ask you better than Amazon itself.

The problem is that there are many failures of "crowd wisdom" (as mentioned in that Wikipedia page in the link above). My conclusion is that most people advocating crowd wisdom have not realized that it is limited to "perceptions". Many Web 2.0 sites are asking the crowd instead about rational judgments, expecting them to come up with a better answer than individuals. The idea of democracy (i.e. giving you the right to vote) has been confused with voting guaranteeing the best answer, no matter the question. In fact, Kierkegaard wrote "Against The Crowd" almost 200 years ago where he recognized that individuals act like witnesses to an event, whereas people speaking to (or as a part of) a crowd, speak what we would now call "bullshit" because they are self-consciously part of a crowd. We can see this in the different results of an election primary (a collection of individuals in private voting booths) versus Caucuses where people vote in front of each other.So, Web 2.0 sites (Facebook, MySpace, blog Tag Clouds, etc) that allow people to see the effect on other people of what they are saying, are chronicling mob mentality rather than collecting reliable witness reports.

BTW, I have written several blog posts related to vagueness, for example:
http://existentialprogramming.blogspot.com/2010/03/model-entities-not-just-their-parts.html





Sunday, June 6, 2010

Fuzzy Unit Testing, Performance Unit Testing

Because of the topic of testing coming up in my recent post "Is Morality Eating Your Own Dogfood?", it made me finally publish the following notebook entry from 2007...

In reading Philosophy 101, about Truth with a capital "T", and the non-traditional logics that use new notions of truth, we of course arrive at Fuzzy Logic with its departure from simple binary true/false values, and embrace of an arbitrarily wide range of values in between.

Contemplating this gave me a small AHA moment: Unit Testing is an area where there is an implicit assumption that "Test Passes" has either a true or false value.  How about Fuzzy Unit Testing where there is some numeric value in the 0...1 range which reports a degree of pass/fail-ness? i.e. a percentage pass/fail for each test.  For example, testing algorithms that predict something could be given a percentage pass/fail based on how well the prediction matched the actual value.  Stock market predictions, bank customer credit default prediction, etc come to mind.  This sort of testing of predictions about future defaults (i.e. credit grades) is just the sort of thing that the BASEL II accords are forcing banks to start doing.

Another great idea (if I do say so myself) that I had a few years ago was the notion that there is extra meta-data that could/should be gathered as a part of running unit test suites; specifically, the performance characteristics of each test run.  The fact that a test still passes, but is 10 times slower than the previous test run, is a very important piece of information that we don't usually get.  Archiving and reporting on this meta-data about each test run can give very interesting metrics on how the code changes are improving/degrading performance on various application features/behavior over time.  I can now see that this comparative performance data would be a form of fuzzy testing.





Wednesday, May 19, 2010

A hole for every component, and every component in its hole

Amongst the surprisingly simple ideas that aren't so simple when thought about Philosophically are holes.  There is a small library of publications on what exactly holes are.  A summary can be found online in this Stanford Encyclopedia of Philosophy article on the metaphysics of holes.  One of the viewpoints it cites is: ‘There is no such thing as a hole by itself’ (Tucholsky, 1930).  This reminded me of one of my very first blog posts from 2000 which I reprint here...

There is no such thing as a Component

I maintain that there is no such thing as a Component in the same way that there is no such thing as a donut hole. Just as the donut hole doesn't exist without a donut to define it, a Component doesn't exist without a Framework to define it. Using a printed circuit board as a metaphor for a framework, it's the "sockets", into which IC chips are meant to be plugged, that define components. So called universal or standalone components are meaningless (and certainly useless) without some framework that expects components of the same purpose and interface.

Ok, so what's your point? The point is that too many developers (and books on the subject) think about components as standalone chunks of functionality that can be "glued together" after the fact. They don't realize that the framework has to come first and foremost in conception and design. Szyperski doesn't get around to talking about frameworks until chapter 21 of his Component Software book for heaven's sake.

Even physical components are like this. The prototypical component, the IC chip, always was designed within a family of chips that were meant to work together. They all needed the same voltage levels for zeroes and ones and tri-states, the same amperage levels, the same clock rates, etc, etc. Other families used other voltage levels. The first reusable, interchangeable parts in history were for rifles. They were meant to be easy and quick to replace (as opposed to the hand crafted muskets they were replacing) but they were meant specifically to make rifles!

Rummaging around a garage, you could find all sorts of "widgets" and "gizmos" that you might guess are components of something, but unless you know what framework they were meant to be a part of, they are not good for anything but door stops or paperweights. In other words, random components don't tend to fit together or work together.

Too many people are trying to make "universal" components without realizing that those components still work within some framework that allows them to be put together and communicate with each other. The problem is that other people doing the same thing have defined other "generic" frameworks that are none the less incompatible.

For example, the toys that baby boomers played with when they were young abounded with generic frameworks of universal components: Tinker Toys, Lincoln Logs, Erector Sets, LEGOs. They all had universal components within a generic framework that let you build anything. BUT, you couldn't mix Tinker Toy parts with Erector Set parts (without glue or duct tape).

Ah, you say. That's why I like duct tape, weakly typed, languages like Perl that lets me glue together parts. Also, what about Play-doh?! You could stick anything together with that! Yes, but there was a reason you made bridges out of Erector Sets instead of Play-doh, and the same reasons apply to software systems (but Strong versus Weak typing is another discussion).

Objects versus Components

Until I had this epiphany about components as donut holes, I didn't have a good answer to the question "what's the difference between an object and a component?". I now understand that all objects ARE components, but not all components are objects. The framework that defines a set of components does not have to be an object oriented framework. But all object oriented languages define an object framework. They are generic enough frameworks that any objects programmed in that language may inter-operate with each other. Unfortunately though, as with Tinker Toys and Lincoln Logs, Java objects typically can't interact with Smalltalk objects.

In the Java language there are at least two levels of object framework. There are plain old Java objects (POJOs) and there are so-called JavaBeans. Whereas any property of a POJO can be accessed (assuming its not protected by the "private" keyword) via a fooObject.barProperty syntax, only special properties may be accessed via the JavaBeans framework. JavaBeans are those objects that have defined special property accessor and mutator methods of the form: getBarProperty() and setBarProperty(). "JavaBean" is the name given to any component that works within that specialized framework. To make matters confusing however, it turns out that Javasoft called more than one framework "JavaBeans" (arrgh!). There are even more specialized versions of JavaBeans that are made to work with fancy GUI toolkits.  And of course, they caused even further confusion by calling yet another (different) "widget", from yet another (different) framework, a JavaBean: The Enterprise JavaBean! So, without clearly focusing on frameworks, even Javasoft confuses different component types with each other!

The moral? Don't fret that there is no such thing as a truly "universal" component. Don't spend energy trying to build them, or building "single universal" frameworks. Focus on what is needed for your situation and design a well crafted framework first and foremost. If it needs to work with other frameworks (like whatever Microsoft builds that won't integrate with anybody else), understand that framework bridges will be needed. It is the rare case that a mere "socket adapter" will suffice.



Sunday, May 16, 2010

Is Morality Eating Your Own Dogfood?

There are two schools of thought about whether programmers should have to write tests to verify their own code (in addition to writing the code itself). The philosophy of economics, and psychology, and morality, all overlap in studies that show how people will readily abandon moral responsibilities if they are given ways to avoid the stigma of doing so. This leads me to feel more justified in my belief that programmers do a poorer job of reading, understanding, and implementing a specification when someone else has the responsibility of verification.

Changing the rules changes people’s attitudes


There is a 1998 experiment[1] that keeps popping up in the new “freakonomics”-type literature[2][3] (e.g. Economics 2.0 [4]), where a controlled subset of Israeli day-care centers started charging a fine for parents who came late to pick up their children. To everyone’s surprise, the number of people showing up late almost doubled. Additionally, when the fine was later dropped, the number of late parents stayed at the high level. It is theorized that the moral responsibility parents felt to be on time was much stronger than the economic cost of paying a fine, which was rationalized by the parents as a fee, thus removing the stigma of being late. The fee made it “just business”. As Professor Michael Sandel summarized[5]
“So what happened? Introducing the fine changed the norms. Before, parents who came late felt guilty; they were imposing an inconvenience on the teachers. Now parents considered a late arrival a service for which they were willing to pay. Rather than imposing on the teacher, they were simply paying her to stay longer. Part of the problem here is that the parents treated the fine as a fee. It’s worth pondering the distinction. Fines register moral disapproval, whereas fees are simply prices that imply no moral judgement.”
 Don’t make “not my job” “just business”

The blind experiment is a well established doctrine in science requiring that people should not know too much about something they are testing, otherwise the results are often biased. Scientists gathering their own raw data (not to mention interpreting their own data) often get the results they expected to get, where objective outsiders don’t. With that theory, it is argued that software development projects should engage external, objective, “QA testers” to develop and administer test suites against the code produced by the “programmers”. Since many programmers don’t like to eat their spinach, ahem, write their own tests (or documentation for that matter), there are not usually arguments against the idea.

From my experience though…
  • Programmers will suffer peer pressure and social costs if they fail their own tests (which is a good thing).
  • Programmers who understand that they are obligated to deliver testable components, will do so more often if they must actually produce the tests themselves, compared to those where testing is “not my job”.
  • The act of writing a test forces a clearer understanding of both the interface and the implementation of the tested component compared to just programming it.
  • Programmers who fail their own tests will be much more likely to change that component’s implementation if needed, rather than obstinately maintaining that the externally-produced test is wrong.
  • Writing your own tests is the most systematic method of “eating your own dogfood
Belt and Suspenders

I say, both independent testers AND the original programmers should each write independent test suites. This way you get the power of both perspectives. This is an old lesson from the days of computer punch-cards; two different people key-punch the same data so that they can later be compared, thus eliminating most typos.

[1] A Fine is a Price, URI GNEEZY and ALDO RUSTICHINI, Journal of Legal Studies, vol. XXIX ( January 2000)
http://rady.ucsd.edu/faculty/directory/gneezy/docs/fine.pdf

[2] Brain food: when does a fine become a fee?, Aditya Chakrabortty, The Guardian, Tuesday 23 February 2010
http://www.guardian.co.uk/science/2010/feb/23/brain-food-fines-and-fees

[3] Why an L.A. Times wikitorial effort went wrong, Clay Shirky, O'Reilly Media Gov 2.0 Summit, 2009-09-09
http://itc.conversationsnetwork.org/shows/detail4411.html

[4] pg 7, Economics 2.0, Norbert Haring, Olaf Storbeck, Palgrave Macmillan, 2009

[5] Michael Sandel, The Reith Lectures 2009, BBC, 9th JUNE 2009
http://www.bbc.co.uk/programmes/b00kt7rg

Tuesday, May 11, 2010

The purpose of a thing is in US as well as in IT

In an earlier post, I advocated adopting Philosophers' practice of considering the purpose of a thing when creating a definition for that thing.  Plato and Aristotle would have said that one of the things that made an acorn, an acorn, was that it had the "goal" or "purpose" of becoming an oak tree.  In defining a domain model (aka business objects model), document the "purpose" of a class in order to get at its true attributes and behavior. But, as I was recently reminded, not only can the purpose of a thing be "in the thing itself", it can also be solely in our minds.  I.E. it begs the question: if we are defining a class of things, what is our purpose in caring if we know that something is one of those things?

I had this AHA moment after reading the article "Unclassified" in the June 2010 issue of Discover magazine, where I was surprised to learn that there is no accepted universal definition of a biological species; there are at least 20+ competing definitions.  I had thought that "being able to breed fertile offspring" was the definition, but that is only one (and of course it leaves out the vast majority of living things on earth that reproduce asexually).

After having read about all the conflicting ways to organize and cluster individuals into species, each one with its own way of looking at things, it begged the question: Why do you want to know? I.E. What is the purpose of knowing which species something is?"  Depending on why you want to know, you choose one definition over all the others.

But of course as Darwin thought, this would mean that species are not "real". Instead of discovering pre-existing forms, we would merely be inventing arbitrary sets of attributes-in-common. Therefore, unlike the "teleology" of Plato and Aristotle where the "purpose" or "goal" of a species is internal to itself, it would seem that a possibly more important purpose is the one that WE have in wanting to place a particular into that species.

For example, Ponder all of the various shapes, sizes, forms, etc of things that you would want to call a "chair" [and do a Google image search of "chair"]. Now, ponder coming up with a universal definition of chair (such that all chairs would be recognized as such, and nothing else would), and you will see that it will be much easier if you can refer to the purpose we have for them; i.e. being able to sit (comfortably?) on them.  Without that, it is hard to distinguish between a storage box (not a chair) and a storage bench (a chair).  [Try it. Do a Google image search for storage box and then storage bench.]

In this sense, a species would be more like a Java Interface than a Class.  Classes usually embody the pre-existing forms viewpoint, i.e. the notion that attributes and behavior are really "in the thing" rather than merely "how we want to look at it".  And while in practice Interfaces are often just wrappers around class definitions, ideally, each Interface defines a standard socket into which an object of any "form" may fit, as long as it can perform a certain "role" and participate in a certain "protocol" (see my definition of component).

SO, the lesson to learn is: When considering the purpose of a thing as a part of its definition, "purpose" is both its purpose, and our purpose for wanting to recognize one in the first place.





Wednesday, April 21, 2010

Class Constructors considered harmful

PREFACE: There is a school of thought in computer programming that Classes are harmful in certain ways, but the basis for those arguments are very programming-technique specific.  This essay is about a much more general, mind-set oriented, objection.

My basic project these days is to learn Philosophy 101 and contemplate what impact that knowledge should have on software development practices.  After reading topics like the Philosophy of "becoming", and Aristotle's "four causes", it is clear that Philosophers spend much more time trying to define the circumstances of an entity's creation than do system analysts and programmers. While Philosophers have long been concerned with WHY did WHO do WHAT to cause an object to come into being, programmers are mostly concerned with the mechanics of HOW an object should be constructed.  It strikes me that this is due to the tunnel vision encouraged by the class constructor method.  [And, due to destructors and automated garbage collection, an even worse situation applies to object “death”.] So, to paraphrase a famous title, I (albeit tongue-in-cheek) consider Class Constructors harmful.
CASE STUDY: At a recently defunct Top-5 bank, I uncovered the fact that there were major incompatibilities in data being used to produce credit scores for borrowers and their loans. The scores depended on a grade generated for a "facility". The problem was that the very concept and definition of "facility" was not the same in various parts of the bank. If they had answered the following simple questions, they would have realized that they were not talking about the same thing: When and why does a new facility come into existence, and when/why does it cease to exist.
In the course of analysis and requirements gathering, a major goal is to identify and define the various business domain entities. But, in a vicious cycle, the definition of an entity is often too shallow with regard to its birth and death because “causal” information often stagnates as merely background text in some requirements document.  This is because programmers have no standard place to put that logic into the code.

Object oriented practice has one put "all" the properties and behavior associated with an entity into its Class definition, where "all" for class Dog means "the dog, the whole dog, and nothing but the dog"[1].  However the "nothing but the dog" constraint means that, the logic involved in deciding whether an instance of class X should be created, is not normally a method of class X.  Since the "cause" of X's instantiation usually involves other classes, that logic lies outside of X proper, thus, standard development methodologies leave the analysis of causation and purpose out of the design of class X.  Even Factory classes are focused on object construction, rather than why it should be constructed, and why now, and by whom.

The Mediation pattern comes to mind as a place to put this sort of logic because it involves multiple classes.  The logic that decides it is time for a Sculptor to use a Chisel to carve a Statue out of a block of Marble doesn’t belong solely in any of those classes.  A programmer would be tempted to put a “carve” method in Sculptor since that is a “behavior” of the sculptor, but Philosophy considers it an essential part of the definition of the statue itself.  And that is a problem with Mediator classes in the first place; the desire to have everything relevant to X be “nearby in the source code” (a raison d'être for classes) is defeated when some of it is off in various Mediators.  Having the teleology of Statue off in some CarveMediator isn’t much better than it residing in Sculptor.carve().

With event-driven systems (e.g. MOM, SOA), the series of events that trigger the creation of an entity instance may be complex. And whether event-driven, or "batch processing", the sub-systems are often distributed, increasing the value of encapsulating this logic in a single place. With Java EE and service oriented designs, there would be value in having the entity services include this logic.

In any event, I believe that there is a need to learn from Philosophy that their concept of "form" (which is the equivalent of OOP's class) has always included the purpose of a thing as well as its blueprint.

[1] Object Oriented Analysis and Design, Grady Booch,  1991

Monday, April 19, 2010

Silver Bullet: Model the world, not the app

DISCLAIMER: Ok, I admit it...this is cut/pasted directly from my brain fart notebook, i.e. not ready for prime time...but dammit Jim, its just a blog!

In the arsenal needed to fight unsuccessful software development projects, it will take a whole clip full of silver bullets.  One of those silver bullets, I believe, is more accurately modeling the world using knowledge of Philosophy.

There is a great struggle between getting everything "right" up front,  versus, doing "just enough" specification and design.  When trying to balance "make it flexible" in order to support future re-use, versus XP mandates like "don't design what isn't needed today", it is hard to know (or justify) where to draw the line.  Due to "changing requirements", those "flexible reuse" features (that were merely contingent at design time) are often mandatory before the original development cycle is even complete.

WELL, lots of requirements don't change THAT much if you are modeling correctly in the first place.

Humans haven't changed appreciably in millennia, even if the roles they play do.  So, if "humans" are modeled separately from "employees", it is that much less work when you later need to integrate them with "customers". [Theme here is "promote roles programming", the justification of which is made more obvious when taking essentialism to heart.]

In general, the foundation of one's data/domain/business/object/entity-relationship model is solid and unchanging, if all "domain objects", "business objects", etc are modeled based on a clear understanding of the essential versus accidental aspects of the "real world", and NOT based on the requirements description of a particular computer system or application.  Modeling based on "just what is needed now according to this requirements document today" is too brittle, both for future changes, and especially for integrating with other systems and data models.

After all, adding properties and relationships to entities is fairly easy if the entities themselves are correctly identified.  It is much harder to change the basic palette of entities once a system design is built upon them.  Also, all the more reason to be sure and not confuse entities with roles they can take on.

Example: I don't have to wonder who I might have to share employee data with if I realize that an "employee" is actually just a role that a person takes on.  If I model the essentials of the person separately from the attributes of the employee role, it will be much easier to integrate that data with, say, a "customer" database later.  If the customer data model recognizes that "customer" is just a role that a person takes on, its Person table is much more likely to be compatible with my Person table than would be the case with my naive Customer and their Employee tables (and still other Patient tables, etc, etc.)

Tuesday, April 13, 2010

Do Objects Have Souls?

While writing the article, Implementing "Real" Classes in JavaScript for Developer.com, I was tempted to add a sidebar with the provocative title "Do Objects Have Souls?".  The article itself demonstrated a technique for simulating Java-like classes in JavaScript, and as introductory material, it explained the difference between Java's "class"-based semantics versus JavaScript's "object prototype"-based semantics.
For Java programmers: In a nutshell, Java creates object instances that have all, and only, the properties of its Class, and the property list is fixed over the life of the object. In JavaScript, object instances have no "real" class, however, each object is free to add and delete properties at any time.  Because of this, JavaScript Object objects are pretty empty compared to Java Object objects at birth because they can become anything after they have already been instantiated [hence Existential Programming!].
In trying to understand the language differences myself, I began musing on the parallels between philosophical notions of "the soul" and JavaScript's empty shell of an "object" that is generated by obj = new Object;

Souls as property containers

In western philosophy there has been a 2500 year old school of thought that "things" (aka objects) have properties, some of which can never change (i.e. essential properties) versus those which may change over time (i.e. accidental properties).  One concept of "soul" is that it is the bundle of essential properties that constitute a thing.  This idea has also been equated with "identity".  Attached (non-permanently) to the attribute bundle are the various accidental properties.  This sounds a lot like the "empty" JavaScript object which is ready to add and update and delete [accidental] properties, while all the time keeping constant the essential, unchanging, object identity (as referenced by obj !).

When medieval alchemists distilled liquids into their essences they called them spirits because they were the "soul" of the grape, herb, flower, etc. To insure removing all of the accidental properties, they distilled things 5 times to produce the quintessential spirit.  Distilled spirits in the alcohol sense often have names that reflect the notion that they have a life essence captured in them. Whiskey and Aquavit are both names that translate into "water of life" in their original languages.  In the movie Perfume, a villain repeatedly attempts to distill the essence of pretty women using the same techniques as distilling flowers into perfume. [Spoiler Alert: flowers don't survive the process...]

When the Well of Souls runs out of RAM

Another aspect of souls that rhymes with JavaScript is the ancient lore that newborns are given a soul at birth which is plucked from a "well of souls" (aka the chamber of Guf).  In JavaScript, as empty objects are created and given an identity, they are plucked from a heap of available memory (i.e. dynamic memory allocation). In both cases, bad things happen when there are none left.

When the well of souls runs dry, the Messiah will come and reboot the world; when your browser runs out of heap space, your JavaScript will gag and someone will have to come and reboot the browser (or at least the web page).  The plot of the 1988 Demi Moore film, The Seventh Sign, is based on the Guf mythology. Demi's baby is due to be born on February 29 which is the date on which the last soul will leave the Guf and it will be empty.

Tuesday, March 30, 2010

Moore's Paradox... I'm just saying!

The popular phrase "I'm just saying" has been around long enough for most people to have heard it, but not long enough for it to be well-documented as to where it originated.  I heard a great stand up comic bit about it in the 1980's by Paul Reiser. There are several blog sites that muse over its origin and solicit theories:
It turns out that the most common definition of the phrase exhibits a logical paradox from Philosophy.  The book "this sentence is false" is a collection of philosophical paradoxes, and it describes Moore's Paradox (as developed by G.E. Moore).  I summarize it as follows:
Normally, everything that can be said about the world can be said by anyone. I can say the moon is made of green cheese, and you can say it.  The state of the world described by me can equally be described by you with no logical paradox...EXCEPT... I can say that the moon is made of green cheese, and I can say that you do not believe that the moon is made of green cheese, but YOU can not say the same thing.  I.E. you can not say that X is true and at the same time say that you do not believe that X is true.  Note that you are not saying that you could be wrong in your belief, you are saying that you both, believe X is true, and X is not true, at the same time. A logical contradiction.
However, whenever you use the phrase "I'm just saying!", you are in effect performing Moore's paradox.

Monday, March 22, 2010

Don't Call Names...It IS polite to point!

Mom always said "Don't call names!" and "It's not polite to point!".  Ok, so how am I supposed to refer to "you know who" over there?  Too late! I already "pointed" verbally when I said "over there".  But, not only is that ok for programmers, it is actually preferable to point rather than to use names. And a half-century before computers even existed, Philosophers already knew this.  So why are programmers still calling names?
A bit of background first for programmers...
The philosophical use of the word "reference" (and hence "refer") has a subtle technical meaning that, luckily for us, corresponds with the object-oriented technical term "reference".  In Philosophy, the only way words can say something about the real world is via "reference". In Java programs, the only way to say something about an object is via an "object reference".  In other programming languages it is via a "pointer".  Interestingly, according to the 20th century philosopher Bertrand Russell, the only way one can truly refer to a thing (using language) is via a "demonstrative" (i.e. pointer words like "this", "that", "those", "these").

Contrary to previous thinkers, Russell held that proper names (e.g. Joe Blow) do not "refer".  OOP programmers can relate to this because a reference (or pointer) to a Person object can access that object's properties, but the string "Joe Blow" can not...

Person p = new Person("Joe Blow");   // get an object reference "p"
p.weight = 175; // THIS WORKS!
"Joe Blow".weight = 175; // THIS DOESN'T WORK!

Now, the string/name "Joe Blow" could be used in a query that describes a Person object and returns a reference to it. And WAAAY before computers, Russell said the same thing.  Names are a description of something and not a reference to it.
(Descriptivist) Philosophers declared that names were not references because there are a number of problems in logic that arise if they are.  I have written about several of these in earlier blog posts, but in a nutshell:
  • names can change even though the object doesn't (e.g. maiden names)
  • names can have meaning over and above referencing an object (e.g. Superman vs Clark Kent)
  • objects can have more than one name (e.g. Morning Star vs Venus)
  • names can be given to objects that don't actually exist (e.g. Unicorn)
  • not every object has a name (e.g. that piece of paper over there)
While OOP programmers may know that a pointer or reference to an object is different than a "name",  many database designers haven't absorbed that yet.  Of course, they can be forgiven somewhat because the relational database model does not really give them references or pointers.  The only way to access the properties of an object (aka entity) is via a query (and hence a description).  This has led to the common practice of creating an artificial property (aka surrogate key) that can be made to have a unique, unchanging value for every different object/entity, and is a close substitute for a "reference".

On the other hand, there is also the practice of using the name property of an object (or any other real world properties) as a reference mechanism (aka natural key), and so naturally there is great debate about whether this is ok or not, and when to use one or the other.

Philosophy would counsel (as would I) to not call names (i.e. don't use natural keys), and don't use artificial keys that the world knows about (like Social Security Numbers because even those have duplicates!). Below are a few case studies of problems arising from name calling rather than pointing.  They share a base problem that the natural key data is almost never "essential" in the philosophical sense; i.e. the data is capable of changing over time even though the object is considered to be the same object.

Case Study: Yahoo Bookmarks

It turns out that once a bookmark is created on the Yahoo Bookmarks site, there is no way to change the URL associated with that bookmark.  Someone decided to use the URL as a natural key (which by definition should never change), so, the URL can't be edited.  The problem is that a "bookmark" (by my thinking) is not synonymous with a URL.  It is a marker that enables me to return to a web page.  With web sites being revamped all the time, and most URLs not being "permalinks", the same page can have it's URL change over time.  If I need to update the URL, Yahoo makes me delete the old bookmark, create a new one, and re-enter the name, comments, etc from scratch.

Case Study: Qlubb site names

There is a web portal where one can create free web sites for small organizations (i.e. clubs aka qlubbs).  To create a site, you select a club name and then customize the generic site created for you.  However, in the help page, they warn that there is no way to change the name of your club once it is created because
"We currently do not support the ability to change the Qlubb name as there may be database consistency risks. However, if your Qlubb really want to change the name, please have a Qlubb administrator send a note to help at Qlubb with your request. We will evaluate each request and perform the change manually, if it is safe to do so."
Obviously someone mistook a name for a unique and unchanging key.

Case Study: Semantic Web

For all those programmers groaning at the last example, asking how anyone can still make that old mistake in this day and age, the same thing goes on in the new frontier of the Semantic Web.  In most tutorials on the Semantic Web, or tutorials for logic programming languages like Prolog, and yes, even in my youthful whack at a semantic network database, names of things are confused with references to those things.  As I wrote about the flaw in my SN database back in 2007, this causes real problems in all the ways that Philosophers recognized a century ago.

Thursday, March 11, 2010

Some Parts have a (sex) life of their own

In heeding my earlier post which advocates that "wholes" be analyzed and modeled, rather than just their "parts", there is the danger of swinging to the opposite extreme of not considering parts as "individuals".  After all, if one can model an entire Car as having a paintColor property, with a simple value of Red, then there is no need to model Paint as an entity on its own, much less PaintMolecule or Atom.

When the parts of X are considered "stuff" instead of "things" (e.g. paint versus wheels), or, when their abstraction level is so low that they are interchangeable (like WHICH carbon atoms are in the paint), it is usually concluded that they are outside the scope of X’s model because they make no difference at the level of the whole X.  And certainly, if a whole has a property, there is no need to have each part redundantly carry the same property just to parrot the value of the whole.  For example, the case study in that earlier post chronicled the “BigBank” database, in which the record for each obligation in a facility redundantly recorded the same single “facility grade”, even though they were always graded as a whole facility.

The problem, of course, is knowing which parts are relevant at the level of the whole (i.e. the granularity of the model). In recent science news, an example of this quandary has arisen which illustrates how parts that were considered “stuff” are really individual things. It turns out that the rare chicken who grows up half-male and half-female does so in a way that is different from humans and other mammals.

Until now, it’s been assumed that body parts (and hence their constituent cells) always grew up male or female because external hormones told them to, and told them to as a whole.  The Nature paper announced the discovery that each cell in a half-and-half chicken has maleness or femaleness as an intrinsic property from birth. In mammals, if you transplant a formerly “male” cell into a “female” organ, it will start working as a female, whereas in chickens it will keep acting male.  To switch metaphors: It turns out that the car wasn’t painted red and green after the fact; each car part was already red or green ever since its manufacture, and each is resistant to change. So, cells/parts have to be modeled individually if you are building chickens/cars.

Of course, if you are only collecting (and not building) cars, you might still think of color as a property of the car as a whole. But here is the sticky part…a painful situation occurs when the car collector later needs spare parts, but their inventory database was never designed to track part colors.  I have often encountered this situation in corporate/enterprise systems that are too brittle because their data models were overly simplified.  Sometimes the problem is that the designs were based on short term goals. [This mistake is exacerbated by some development “methodologies” that advocate only designing for today’s needs.] Other times, problems result from making quick assumptions about the nature of things being modeled rather than taking a deep look (i.e. using the fruits of Philosophy’s “best practices” as developed over 2500 years). The case study ahead illustrates this scenario.

This case study comes from another BigBank project where there was an enterprise-wide project to revamp systems to use a single standard ID number for each customer.  Their (initial) proposal was published after over a year of analysis and requirements gathering. But, from the very beginning, a Customer was intuitively defined as synonymous with “legal entity”, where a legal entity has only one government Tax ID (e.g. SSN, EIN).

Because only the whole was modeled, all the other systems that needed parts granularity balked, and the project had to take another year to rework its proposal.  Many systems and business processes needed to work with individual stores, branches and managers, and they depended on having a separate “customer ID” for each location.  While many companies have separate corporations (i.e. tax IDs) for each branch, some very large companies (e.g. Walmart) do not.  Because many systems and business processes required a single bank officer to service a single customer, it would have forced thousands of branches onto a single virtual desktop.  It also turned out that, while cities have a single tax ID, some parts (e.g. the City of Atlanta) are not legally liable for the debts of other parts (e.g. the Atlanta airport) even though they all share the same single tax ID.

So, without sufficient analysis, things that seem like wholes, can really be collections of parts.

Thursday, March 4, 2010

Model Entities, not just their parts

One of the oldest puzzles in Philosophy is the paradox of how something can change and yet still be considered the same thing. After all, if “same” is defined as “identical; not different; unchanged”, then how can it “change”?  On the other hand, even if I lose that hand (pun intended), I am still the same me. In chapter 5 of Peter Cave’s new book, “this sentence is false”[1], there is a collection of example paradoxes that illustrate how our intuitions about “sameness” are inconsistent.  Some paradoxes involve entities (or properties) whose definition is "vague", as in “How many cows make up a herd?” or “At what weight does adding a pound change you into being ‘fat’?”  However, here I will be focusing on the change paradoxes involving things with a well defined set of parts. They illustrate the problem with defining something as merely the collection of its parts (unless of course “it” is truly only a collection, and not an entity in its own right).
George Washington's axe
Harry: I have here the very axe with which George Washington chopped down the cherry tree. It’s been used by my family for generations.
Sally: But this says “Made in China”!
Harry:  Well, over the years, the handle was replaced each time it wore out. Oh, and the blade’s been replaced a couple of times too.
Sally: But those are the only two parts…that’s not the same axe at all then!!

Ship of Theseus
(original paradox by Plutarch)
Theseus had a ship whose parts were replaced over time such that, at a certain point, no original pieces were left.
How can the latter ship be said to be the same ship as the original if they have no parts in common?

(sequel paradox by Hobbes)
Suppose that those old parts were stockpiled as they were being replaced, and later they were reassembled to make a ship.
NOW, which ship is the same as the original ship; the one with the original parts, or, the one with the replacement parts?
At the bottom of these paradoxes is the question of whether a thing-made-up-of-parts is the same as the collection of all its parts.  I.E. can everything that can be said of the whole thing be equally said of the collection of all its parts, and vice-versa? For 2500 years, western philosophers including Socrates, Plato, and Aristotle, right through to the 21st century, have been debating this question, generating whole libraries of book and papers.  In fact, Mereology is an entire field of study that is just about the relationship between parts and their respective wholes.

What does it mean to be an individual?

As discussed (at great length) in the book Parts[2], there is a whole spectrum of things in between “individuals” and “groups”, and they are referred to in everyday language by singular terms (e.g. person), plural terms (e.g. feet), and some words that could mean either (e.g. hair). There are individuals (say, a car), parts of individuals that are themselves individuals (say, a wheel), parts of individuals that are NOT themselves individuals (say, the paint), collections that do not form an individual (say, “the wheels of that car”), collections that DO constitute an individual (say, the car parts that comprise the engine where the engine is itself an individual), and so on, and so on.

A key to distinguishing whether a thing being referred to is truly a thing in its own right (and not just a plural reference masquerading as a single thing) is what sorts of things can be said about it.  Orchestra is an ambiguous term because it can be used as a singular or a plural as in “the orchestra IS playing” vs the equally grammatical “the orchestra ARE playing”.  If it is considered an individual then we can say things about its creation, its history, etc, whereas the plural use simply denotes a collection of players where not much can be said about “it” apart from the count of players, their average age, etc.  Relational Database programmers will recognize individuals as those that get their own record in some entity table, and plurals/sets/collections as equivalent to the result set from some arbitrary query.  SQL aggregate functions (like count, average, minimum, maximum, etc) are the only things that can be said about the result set as a whole. Result sets do not get primary keys because they are not a “thing”, whereas real individuals do (or should!) get their own personal identity key.  Even when an arbitrary query is made to look like an entity by defining a “view”, it is not always possible to perform updates against the search results because the view is not a real entity.

What does it mean to be the same?

A big problem is that there are many different flavors of “sameness” when we say that A is the same as B. Right off the bat there is a difference between Qualitative identity versus Numerical identity. Two things are qualitatively identical if they are duplicates, like a pair of dice. Two things are numerically identical if they are one and the same thing, like the Morning Star and the Evening Star (both of which are, in fact, really the planet Venus).  They are “numerically” identical in that when counting things they only count as one thing.  Another complication is that there is a difference between identity (right this second) versus Identity over time which deals with the whole question of how something can be different at two different times and yet still be considered the same thing.  For example, you are still considered numerically identical to the you of your youth even though you have clearly changed…although this gets into the even more involved topic of Personal Identity [which may or may not apply to an axe ;-) ] Traditionally, if x was identical to y, and y was identical to z, then x had to be identical to z. Relative Identity has been proposed such that this need not be true, thus allowing both the morning and evening stars to be identical to Venus but not to each other.

When specifically asking whether the paradoxical ships and axes are numerically identical, as Peter Cave points out, two of our usual criteria for being “one and the same thing” are in conflict.  They are (a) being composed of largely the same set of parts, and (b) being appropriately continuous through some region of space and time.  The continually refurbished ship meets (b) but the reassembled original parts meet (a).

In traditional logic, as formulated in Leibniz’s Law, two things are the “same” only if everything that can be said about one thing can also be said about the other. In other words, all the properties of each object/entity need to be equal if they are one and the same.  By this token, the two axes and the various ships are not the same.  Of course, this means that ANY change to ANY property causes the new thing to not be “the same” as the old. To avoid this, others have said that only essential and not accidental properties should be compared.  This means that the definitions of “ship” and “axe” should distinguish between those properties that must remain the same throughout the lifetime of the object versus those properties that may change over time.
Java Programmers can relate to the philosophical meanings of “essential” and “accidental” in the following way. [To keep this sidebar simple, think of “entity beans” where only one bean/object/instance is allowed to represent a particular real world entity (e.g. {name=Joe Blow,ssn=123456789})…i.e. there are never multiple object instances in RAM simultaneously representing Joe.]    Class definitions could have “essential” properties implemented via constants (i.e. final instance variables initialized in the constructor ala the Immutable design pattern). And, “accidental” properties are implemented via normal instance members.

The essential properties must be final because if their values were different then they would have to be a different individual.  E.G. If an instance of class Person has a constant DNA_Fingerprint_Code with value of 1234567890, it would not be correct to change that value on that same object because a person’s DNA both defines them and never changes; i.e. “essential” in the Philosophy sense. The correct procedure would be to create a new instance of Person because it must truly be a different person if it has different DNA. [Of course, this brings up the whole separate topic of the difference between changing a property’s value because it has a truly new value versus merely correcting a mistaken value.  Normally, computer software has not been designed to make this distinction even though it would make some systems much more robust, and able to reflect reality better if they did.]

The putative method IsTheSame(Object o) would compare either all properties, or only essential properties, of this and o depending on your philosophy.  [This also brings up the whole separate topic of the Java equals() method, and the many potential meanings of “equals” apparent when thinking Philosophically.]
More than the sum of its parts

So, the particular individual parts of a thing need not all be “essential” properties of that thing, and hence they may change without affecting that thing’s identity.  (You are still you even if you lose a leg or lung, but not a head). Well then, what are some potential essential properties of an individual thing?  Many advocate taking a look at Aristotle’s “four causes” of a thing, where he defined “cause” as anything that was involved in the creation of that thing.  His two main varieties of causes were intrinsic, for causes that are “in the object”, and extrinsic, for those that are not.  The two sub-varieties of intrinsic causes were material cause (the material the thing consists of) and formal cause (the thing’s form [OOP programmers think Class]).  The two sub-varieties of extrinsic causes were efficient cause (the “who” or “what” that made it happen, or “how”) and final cause (the goal, or purpose, or “why”).

By analyzing the paradoxes using Aristotle’s causes it can be argued that the Ship of Theseus is the same ship, because the form does not change, even though the material used to construct it may vary with time. Also, the Ship of Theseus would have the same purpose, that is, transporting Theseus, even though its material would change with time.  The builders and tools used may, or may not, have been the same, therefore, depending on how important the efficient cause is to you, it would make more or less of a difference.  So, giving priority in definitions to some causes over other causes can answer riddles like these.

Further more, analyzing the “causes” of a thing’s creation, forces one to agree on when a thing actually comes into and out of existence, how to tell it apart from other similar things, how to count them, how to recognize it again in the future, and so forth.  Circularly, Causes also provide justifications for those agreements.  These criteria for identity help define the sortal definition of the thing (i.e. knowing how to sort these sorts of things from other sorts of things, and being able to count them on the way).

Case Studies: BigBank “Facilities” and Customers

I worked on some projects at "BigBank" (a recently defunct Top-5-in-the-USA bank) where these Philosophy-inspired techniques would have really helped.  Here are two case studies that illustrate the problems of modeling the parts but not the wholes.

In the first case study, BigBank (in order to meet new international banking standards) needed to retrofit its computer systems to record and report on their track record in guessing whether loans would be paid off eventually.  Each guess took the form of a “default grade” for a package of loans, each known as a “facility”.

A major problem was that their various systems did not agree on the basic definition of “facility”.  This was because the definition of a “facility” went so without saying that no one actually said (in a rigorous way) what it was.  Everyone interviewed knew intuitively what one was but couldn’t quite put it into words, and when pressed, it turned out that they all had different definitions from each other.  As a result, the various systems around the bank were built with different ontologies (i.e. models of the world).  A key problem was that many of BigBank’s systems assumed that Facilities were no more than the collection of their parts, and so only the parts were recorded with no standard place to say things about each Facility as a whole.  As a result, it came as a surprise to everyone that there had never been any agreement as to when which parts belonged to which wholes, nor even when any particular whole Facility came into or out of existence. Consequently, BigBank had several different “Facility ID”s, none of which agreed which each other, hence, no way to definitively report on the history of any particular Facility.
CASE STUDY:  At BigBank, credit grades are calculated for "facilities". A facility is a collection of "obligations" (i.e. loans, lines of credit) that are being considered together as a single deal and graded as a whole. The particular set of obligations grouped into each "facility" changes over time as individual obligations get paid off or expire. Plus, changed or not, the facilities are supposed to be re-graded from time to time.  Unfortunately, some key BigBank databases only had records for individual obligations. There was no Facility entity table.

So, for example, whenever a "facility" was (re)graded, in reality, only a set of obligation records were updated, all with the same single “facility-grade”. In fact, other than the loan officer's neurons, there was no record of which obligations had been associated with which "facility" over time.  So, when there was a new requirement to store for each facility all its grading documents, there was no place to put them. Even worse, since a Facility entity had never been formally defined, the analysis had never been done to make sure everyone had the same definition of a "facility" (which they didn't).
There was no agreement on what the thing being graded actually was! For some, each individual grading event was considered a "facility" (along with its own "facility ID") because "the grading sheet is what is graded".

A second case study (which I detailed back in 2006) involves BigBank's treatment of customer information. Some BigBank systems defined Customer entities and assigned a single ID for each one, but other systems gave the same person or corporation a different ID in each state and called them Obligors. Once again, some systems modeled only the wholes (i.e. customers) and other systems only modeled the parts (i.e. obligors). And once again, because the systems working at the parts level did not tie them together as a whole, there was disagreement about which obligors belonged to which customers.  It had become so bad that the data model had to be changed to allow multiple customers to be tied to a single obligor, lest conflicting data feeds go unprocessed. It was like having Person records and BodyPart records, but needing to kludge in the ability to have multiple people associated with the same particular foot!

[1] chapter 5, this sentence is false, Peter Cave, 2009, Continuum, ISBN: 9781847062208
[2] Parts, Peter Simons, 1987, Oxford University Press
[3] Introducing Aristotle, Rupert Woodfin and Judy Groves, 2001