Thursday, April 17, 2008

Reality is the System of Record

"The system of record is the place where there is a definitive value for some unit of data... If you have no system of record for your bank account or if you have multiple systems of record for the same account, something is fundamentally wrong."
Bill Inmon, father of the data warehouse.[1]
Over my years of consulting in large enterprise environments, I've heard arguments over whose system is the "system of record" for some particular piece of data.  I have seen programmers officially acknowledge another system as the SOR, all the while building their own system as if it were.  I've also heard corporate developers say that something is a customer if and only if it has a record in the SOR (never mind what the customer thinks, nor if the record literally has the name "Donald Duck").  I've seen that same customer information system built with so little regard for mirroring the real world that it defined Frankenstein customers, some of whom were composed of parts from multiple actual people.  Owning an SOR seems to breed a certain lack of humility which (IMHO) could benefit from learning a little Philosophy.  It will teach you that you are never the system of record, and instead, merely a "faint copy" of one of many idiosyncratic conceptions of the world.

Representationalism

In modern Cognitive Science, there is the assumption that "the mind has mental representations analogous to computer data structures". The idea in Philosophy that "the mind perceives only mental images (representations) of material objects, not the objects themselves"[2] is called Representationalism (and more generally indirect realism), and it can trace its roots across 2400 years of Philosophy of Mind.  Socrates says (in Plato's Parable of the Cave) that most people only see shadows of puppets instead of reality. Aristotle said thoughts are likenesses of things, and words refer to things indirectly through thoughts.  Rene Descartes proposed that all sensory information is transmitted by the nerves to a central "theatre", where the soul makes contact with the physical body and watches it.  John Locke said, "The mind represents the external world, but does not duplicate it."  David Hume thought that ideas were "faint copies" of physical sensations.

Skepticism (i.e. the notion that we don't know what we think we know) is one of the oldest ideas in Philosophy, and the raison d'être for the entire branch called Epistemology (which asks how can we be sure we know what we think we know?).  They were born from the ancient realization that our formation of ideas, concepts, and representations out of our sensory input (hidden behind a "veil of perception") is a very inexact process, complete with optical illusions, dreams, hallucinations, color-blindness, double vision, etc, etc.  In Immanuel Kant's "Critique of Pure Reason", he argues that because our minds are hardwired to perceive the world in certain ways, they actively shape experience rather than passively record perceptions.  His claim was that our sense perception is effectively a pair of tinted glasses that we can't take off.  Because we've never seen the world without them, it takes effort to see the world as it is really is.  One of the very reasons to practice Philosophy is to understand the true nature of things and avoid the "naive realism" of "common sense". Guides to this understanding are found in the branches of Metaphysics and Ontology which explore how to understand and model the world respectively.  One of the tactics of Epistemology to aid in this effort is the doctrine of Verificationism which says that a statement has no meaning if there is not a way to verify its truth.

For programmers, the big epiphany here should be that their business database is really just one particular representation of reality, not reality itself.  Reality is the System of Record. A developer should be humble, and realize that it is difficult to accurately model the world such that it integrates with the data models of other systems.  They also should be skeptical that their actual data is both accurate and up-to-date, developing ongoing mechanisms to actively verify each. 

Syncing the Systems of Record

When any system keeps copies of data for which it is not the system of record, that system is actually just a cached copy of that external data.  And like any cache, it has the responsibility to keep track of whether its copy is "stale", and to implement mechanisms to insure "cache coherence". Once one realizes that reality is the system of record, it becomes clear that it is not enough to keep databases in sync with each other; they ALL must chase after the ever-changing state of the world.

In a large organization, each database is just one of many competing representations. In the same way that different people have different mental representations of their shared reality, different databases will each have their own slightly (or largely) incompatible data models, and sets of data values, for the same domain entities.  A side-effect of this is that many systems keep copies of external data that they've transformed in some way for their own use.  When systems don't realize that their SOR data represents the same real-world entities as other SORs do (e.g. patient DB versus employee DB), the eyesight of the entire enterprise goes out of focus as each data-set drifts apart.

CASE STUDY: ChoicePoint

ChoicePoint was an independent company prior to being bought out by Reed Elsevier in 2008.  It collected and combined data about businesses and individuals from a wide variety of sources, selling access to both private and public (i.e. government) clients.  While consulting there, I saw first hand how data was effectively only verified if someone phoned in to complain about inaccuracies. I also was informed by management that no access control mechanisms were to be included in the design of a new system, despite Congress exploring adding requirements for such.  They said that ChoicePoint wanted to be able to protest at the cost of adding the controls after the fact in the hope that it would defeat the requirements in the first place. There have been a whole series of lawsuits and government actions against ChoicePoint for out-of-date data, inaccurate data, and selling data to unauthorized buyers which cost it so much money that it had to be sold.

[1] The System of Record in the Global Data Warehouse, Bill Inmon, Information Management Magazine, May 2003
http://www.information-management.com/issues/20030501/6645-1.html

[2] Representationalism, Encyclopædia Britannica
http://www.britannica.com/EBchecked/topic/498476/representationism


No comments:

Post a Comment