Monday, March 22, 2010

Don't Call Names...It IS polite to point!

Mom always said "Don't call names!" and "It's not polite to point!".  Ok, so how am I supposed to refer to "you know who" over there?  Too late! I already "pointed" verbally when I said "over there".  But, not only is that ok for programmers, it is actually preferable to point rather than to use names. And a half-century before computers even existed, Philosophers already knew this.  So why are programmers still calling names?
A bit of background first for programmers...
The philosophical use of the word "reference" (and hence "refer") has a subtle technical meaning that, luckily for us, corresponds with the object-oriented technical term "reference".  In Philosophy, the only way words can say something about the real world is via "reference". In Java programs, the only way to say something about an object is via an "object reference".  In other programming languages it is via a "pointer".  Interestingly, according to the 20th century philosopher Bertrand Russell, the only way one can truly refer to a thing (using language) is via a "demonstrative" (i.e. pointer words like "this", "that", "those", "these").

Contrary to previous thinkers, Russell held that proper names (e.g. Joe Blow) do not "refer".  OOP programmers can relate to this because a reference (or pointer) to a Person object can access that object's properties, but the string "Joe Blow" can not...

Person p = new Person("Joe Blow");   // get an object reference "p"
p.weight = 175; // THIS WORKS!
"Joe Blow".weight = 175; // THIS DOESN'T WORK!

Now, the string/name "Joe Blow" could be used in a query that describes a Person object and returns a reference to it. And WAAAY before computers, Russell said the same thing.  Names are a description of something and not a reference to it.
(Descriptivist) Philosophers declared that names were not references because there are a number of problems in logic that arise if they are.  I have written about several of these in earlier blog posts, but in a nutshell:
  • names can change even though the object doesn't (e.g. maiden names)
  • names can have meaning over and above referencing an object (e.g. Superman vs Clark Kent)
  • objects can have more than one name (e.g. Morning Star vs Venus)
  • names can be given to objects that don't actually exist (e.g. Unicorn)
  • not every object has a name (e.g. that piece of paper over there)
While OOP programmers may know that a pointer or reference to an object is different than a "name",  many database designers haven't absorbed that yet.  Of course, they can be forgiven somewhat because the relational database model does not really give them references or pointers.  The only way to access the properties of an object (aka entity) is via a query (and hence a description).  This has led to the common practice of creating an artificial property (aka surrogate key) that can be made to have a unique, unchanging value for every different object/entity, and is a close substitute for a "reference".

On the other hand, there is also the practice of using the name property of an object (or any other real world properties) as a reference mechanism (aka natural key), and so naturally there is great debate about whether this is ok or not, and when to use one or the other.

Philosophy would counsel (as would I) to not call names (i.e. don't use natural keys), and don't use artificial keys that the world knows about (like Social Security Numbers because even those have duplicates!). Below are a few case studies of problems arising from name calling rather than pointing.  They share a base problem that the natural key data is almost never "essential" in the philosophical sense; i.e. the data is capable of changing over time even though the object is considered to be the same object.

Case Study: Yahoo Bookmarks

It turns out that once a bookmark is created on the Yahoo Bookmarks site, there is no way to change the URL associated with that bookmark.  Someone decided to use the URL as a natural key (which by definition should never change), so, the URL can't be edited.  The problem is that a "bookmark" (by my thinking) is not synonymous with a URL.  It is a marker that enables me to return to a web page.  With web sites being revamped all the time, and most URLs not being "permalinks", the same page can have it's URL change over time.  If I need to update the URL, Yahoo makes me delete the old bookmark, create a new one, and re-enter the name, comments, etc from scratch.

Case Study: Qlubb site names

There is a web portal where one can create free web sites for small organizations (i.e. clubs aka qlubbs).  To create a site, you select a club name and then customize the generic site created for you.  However, in the help page, they warn that there is no way to change the name of your club once it is created because
"We currently do not support the ability to change the Qlubb name as there may be database consistency risks. However, if your Qlubb really want to change the name, please have a Qlubb administrator send a note to help at Qlubb with your request. We will evaluate each request and perform the change manually, if it is safe to do so."
Obviously someone mistook a name for a unique and unchanging key.

Case Study: Semantic Web

For all those programmers groaning at the last example, asking how anyone can still make that old mistake in this day and age, the same thing goes on in the new frontier of the Semantic Web.  In most tutorials on the Semantic Web, or tutorials for logic programming languages like Prolog, and yes, even in my youthful whack at a semantic network database, names of things are confused with references to those things.  As I wrote about the flaw in my SN database back in 2007, this causes real problems in all the ways that Philosophers recognized a century ago.

No comments:

Post a Comment