Saturday, February 15, 2014

It's about Time, It's about Space

A 1960s TV series theme song began, "It's about time, it's about space...". Some, from Physics to Philosophy, say it's about both, claiming they are each aspects of a single space-time. Computer systems developers need to consider this as they build GIS applications.

Ontology, being the branch of philosophy concerned with describing "what exists", tackles the topics of Space and Time since they are often used to describe things. An Introduction to Ontology[1] devotes a chapter to each. As usual, things are more complicated than our initial intuition expects, and debate continues about different viewpoints. In a nutshell, the following are discussed:
  • Space is usually defined in terms of "regions"
  • Space is either absolute or relative.
  • Space is either something things "are in", or it is synonymous with the thing itself
    i.e. regions only have properties like size and location, versus, a region itself having the property blue if the stuff in it is blue.
  • Space is either Euclidean or not (i.e. flat or curved)
  • Space is either separate from Time, or parts of the same thing: space-time.
Most programmers today, in the age of Map apps, Geographic Information Systems, and geocoding, take the view that an entity such as a business or address is located at some location. The location ideally could be defined as a collection of regions defined by GPS coordinates. Often, the location is (over)simplified to a single point on a map.

While it is recognized that many problems exist with actual databases of geocoded entities, it is usually assumed that they are in the realm of epistemology rather than ontology. In other words, it is assumed the problem is with "our knowledge" due to inaccuracies in the set of GPS coordinates; not that locations don't actually have a definite set of coordinates.

However, not every entity that takes up space has a well-defined and unchanging mapping to a set of GPS coordinates.  ZIP codes, for example, are not defined in terms of geography but rather as collections of delivery routes. Another example, as shown in the title insurance case study below, is in real estate legal descriptions. In addition to a knowledge problem caused by ambiguous language used in these descriptions, they can also refer to ephemeral landmarks.

While a naive assumption that space is different than time is often made in data model design, entities like ZIP codes and Legal Descriptions require a time dimension to be completely accurate. It turns out that the mapping of zip codes to postal routes changes several times a year.  And landmarks, referred to in property descriptions, can change location and shape over time.

Case Study: TICOR Title Insurance System
OMEX was a startup that was an early pioneer in creating optical disk technology for data storage. It took on a contract to produce a computer system to support TICOR, the largest title insurance company in the U.S.  TICOR itself had the contract to keep backup copies of all the real estate transactions filed with Los Angeles county.  As a part of archiving copies of the documents, it was free to use the information in them, and hence support its business of providing title insurance.

The computer system was to replace using microfilm photos of the documents with optical disk storage of the images.  It would link these images with a structured database of information related to each property. One of the goals of the database was to enable answering basic questions about property locations.

The programmers, having a naive notion of how property boundaries were defined, were surprised to see that a common method is “metes and bounds” which uses plain english descriptions using landmarks. E.G. "beginning with a corner at the intersection of two stone walls near an apple tree on the north side of Muddy Creek road one mile above the junction of Muddy and Indian Creeks, north for 150 rods to the end of the stone wall bordering the road, then northwest along a line to a large standing rock on the corner of the property now or formerly belonging to John Smith, thence west 150 rods to the corner of a barn near a large oak tree, thence south to Muddy Creek road, thence down the side of the creek road to the starting point."

As can be seen, it would be difficult to translate this into a collection of GPS coordinates. But even if you did, you would not be done with the problem.  Like ZIP codes that change over time, the location and shape of creeks, rivers, etc change over time. Lest you think this is a merely theoretical problem, for centuries, States have sued each other over land ownership due to border rivers migrating over time.

Ultimately, the computer system wound up just using unstructured text fields to contain the legal description rather than the more ambitious GIS database they had originally promised.

[1] An Introduction to Ontology, Nikk Effingham, Polity Press, 2013
[2] A River Runs Thru It, How the States Got Their Shapes, History Channel, 2011

Saturday, April 27, 2013

Form Input Validation and Defeasibility

A major subcategory within Philosophy is the study of Logic.  Computer programmers are very familiar with some categories of logic, for example Boolean logic, but they may not be aware of how many different kinds of logic philosophers have come up with in the past 2500 years. While the familiar systems of logic seek to insure their conclusions are absolutely true, there are other logics that do not require this, and they have lessons for programmers.

Aristotle described a system of logic that was so iron-clad, it was 2000 years before philosophers and mathematicians started seriously adding to it. It is famous for defining the syllogism which is a form of deductive logic, where certain conclusions can be mathematically proved to be true as long as the starting premises are true. The prototypical example of a syllogism is:

P1: All men are mortal.
P2: Socrates is a man.
C: Therefore, Socrates is mortal.

As long as the two premises P1 and P2 are true, C is guaranteed to be true.

However, unlike deductive reasoning, everyday situations usually need to come up with conclusions with less than absolute provability. Jurors need to come up with a verdict even though they have not been given iron-clad arguments. One of the several types of reasoning that is not deductive is called defeasible logic.

Many everyday arguments rely on assumptions that are true "all things being equal". For example, if told that birds were being transported in boxes with no lids, you might conclude that there was a large risk that the birds would fly away, and therefore lids were needed. Given what you know, this is a reasonable conclusion. However, those arguments are defeasible, which means that there are other additional statements that could change the conclusion if they were added to the mix. If we add the statement "the birds being transported are penguins", the conclusion that they might fly away would be defeated since penguins can't fly. On the other hand, some arguments are indefeasible. For example, "John is a bachelor therefore John is unmarried" would not be false, no matter what additional statements are made because being unmarried is the very definition of being a bachelor.

It is helpful to know this little distinction between defeasible and indefeasible conclusions, as is shown in the following little case study.

Case Study: Form Input Validation
It is very common for electronic form systems (e.g. a web page order form) to validate the data entered by a user before allowing the form to be processed. In developing a widget framework used for building  a collection of online banking applications, there were two levels of form input validation provided. One level caused error messages to be immediately given to the user, and the other level would only display after the user attempted to submit the form.

Knowing the distinction between defeasibility and indefeasibility is the key to knowing which level any particular validation rule should be.  For those input strings that are invalid and no additional input will change that, their error status is indefeasible and the user can be notified immediately. For example, if a field requires a number and the user has entered a letter "A", no additional input will make it a legal number.

On the other hand, if a field requires an email address, and the user has entered "foo@", that is invalid because it is not a complete address. However that status is defeasible because more input can change it to a valid value.  Therefore the user should not be harassed about it until a submit is attempted.

Saturday, February 2, 2013

What is Philosophy?

Because I am endeavoring to teach ideas from Philosophy to computer programmers, who typically have no background in it, a question I must answer right off the bat is "what IS philosophy?"  Here is the answer I provided to a new MOOC "Introduction to Philosophy":
Given that the single word "philosophy" is commonly used to refer to multiple different (albeit related) things, all answers to the question "Q: What is [Western] Philosophy?" actually depend on one: "doing philosophy; the act of philosophizing" which I explain to even children as:
A: Being able to answer the question, Why do you think what you think? Why do you believe what you believe?
And while you want to be able to answer to someone else's satisfaction, you foremost want to be able to answer to your own satisfaction because you want to know the truth. (BTW, the western bit is "we believe humans are capable of working it out ourselves")
All the other things the word philosophy can refer to build on, and depend on, the above. e.g. Philosophy as meaning "the body of knowledge accumulated by those doing philosophy" (which entails history of philosophy, individual ideas, individual philosophers, tools/techniques/criteria for doing it well, favorite topics for philosophizing about, etc, etc) all naturally spring from someone somewhere starting to "do philosophy".

Of course for programmers who will say, "so what does that have to do with me?", I have to quickly add that even thousands of years ago, philosophers already had come up with some really good techniques for describing the world, and being able to justify that those techniques were better than our intuition.  So, since a large portion of our work as programmers is to describe the world via our data models, object models, class hierarchies, classification schemes, etc, etc, we would get better at it if we replaced our intuition with techniques philosophers know but we typically havent been taught.