Tuesday, September 29, 2015

No Social Constructs in My Little Town

In Paul Simon’s song “My Little Town”, there is the lyric “everything’s the same in my little town”.  This can be contrasted with "the big city" that is cosmopolitan, multicultural, not all the same.  It is only when you have the experience of more than one culture that it becomes natural to see that there can be more than one notion of how things are.  In Philosophy, when there is one “correct” definition of something, due to it existing in nature, independent of man, it is called a “natural kind”.  However, when something only exists because people have agreed to think that it does, that is called a “social construction”.  Computer programmers need to be aware that, of the things they have to model in their databases, user interfaces & business models, most are social constructions, and hence, there are many ways to skin a cat, but not arbitrary ways.  Two case studies are given that show how programmers can err on either side of the spectrum.

Social Constructs versus Natural Kinds

It is common to consider Natural Kinds to be “discovered”, and Social Constructions to be “invented”.  An example of something that only exists because we say it does is money.  A Bitcoin (or an ounce of gold, or a piece of paper with $100 printed on it) is worth whatever we say it is, and how many Big Macs can be bought with each is whatever we agree upon.  And via the computerized marketplace, we can change our collective mind every microsecond.

Social Constructions are also inherently “relative” to some culture, which means that there can be more than one version of it floating around, and each can be equally valid.  A traditional example is the definition of what it means to be a woman.  While there are the aspects of womanhood that are controlled by DNA and biology, many aspects are defined by society, and there are many different definitions existing simultaneously.

On the other hand, we believe that atoms exists in nature.  We may discover better definitions and understandings of them over time, but those would be mere changes in our knowledge of them, rather than making them an invention.  When Europeans thought that swans only came in white, and then found black ones in West Australia, their definition of swan changed, but we still think that swans are in fact a natural species, independent of whether you’re European.

Natural Kinds that Aren’t

There are times that something is thought to be a natural kind, but later realized to not be, for example, planets.  We thought that planets existed objectively and independent of our latest definition.  We now realize that those orbiting chunks of rock and clouds of gas may exist objectively, but our classifications of “planet” versus “dwarf planet” versus “failed star” do not.  I.E. they are arbitrary enough that aliens landing here will likely have different ways of classifying orbiting stuff.

Saving the Phenomena

So, if planets aren’t “real”, where did they come from?  An early view of the universe was that everything literally revolved around the earth in a perfect circle, except for a handful of “wanderers”, the original meaning of planet.  As we gained more knowledge, we would update our definitions. But we always (if only unconsciously) wanted to “save the phenomena”; in other words, make sure that the new definitions didn’t drop any planets and didn’t add any, otherwise, we would be defining something that did not match our intuitive notion of planets.

Of course, recently that became impossible because we realized that we were either going to have to add hundreds of new “planets”, or, drop Pluto, to be consistent.  The more that people tried to keep the original collection, the more it became clear that the collection was based on culture and history rather than an objective category of things in space.  Some even say that Jupiter is not really a planet, but a failed star.

Case Study: World Headquarters in My Little Town

The world headquarters of Coca-Cola is in Atlanta, and while a world headquarters would be expected to be pretty cosmopolitan, it is in The South which was traditionally very monoculture, conservative, and religious (which I can say because I grew up there).  I was there, on a Y2K project, redesigning data files which were using just 2 digits to represent years (even though the file formats had been specified only two years previously, by the way).  During discussions with the developers (all Atlanta locals), it was assumed that there was only one obvious “correct” way to represent dates in a string: mmddyyyy.  Having lived overseas, I knew to point out that most of the world doesn’t do it that way, instead using ddmmyyyy, or yyyymmdd (and we did not even get into other calendar systems).

The point is that it was assumed that dates were a natural kind when they are actually very socially constructed. In my little town every thing is the same and therefore looks like the one and only way God intended.

Case Study: Bizarrely Arbitrary User Interface at domain.com

While the concept of “social construction” says that there can be several equally-valid ways of defining some things, do not forget about the “social” part! I.E. you should not create an arbitrary definition that no one actually uses and therefore no one will understand.

On the website of domain.com (a domain name registry provider), there is the domain name registration form which includes a mandatory phone number field.  The required format is so bizarre though that it took a chat with customer support to figure out what is was.  It turned out to require the phone number to be entered as the fractional portion of a floating point number…let that sink in…floating point notation, with a mandatory leading plus sign and mandatory integer of 1.  So, phone number “(123) 456-7890” had to be entered as +1.1234567890  ...AND, to make matters worse, the error message received when it was not entered that way, only said that a legal phone number was required, without explaining what the non-obvious required format was.

When I pressed the support chat operator for an answer to my question, WTF?! , I was told (after some time on hold) that “that was the format that the developers chose”. There was no answer to my question: Of all the phone number formats on the planet, who has ever used that?   Apparently it was the culture of domain.com off-shored contract developers with no managers who were engineers enough to review the design.

Friday, July 4, 2014

Programmers also need Moral Philosophy

I stand corrected; programmers need knowledge of moral philosophy too.  I realized this after hearing this BBC story about developers of self-driving cars explicitly asking philosophers for help in formulating which person the car should hit.

When first starting my project to teach other programmers all the practical concepts I was learning from Philosophy, I focused on ontology, the study of describing the world. Philosophy has 2500 years of study of this topic that computer science naively leaves up to intuition. I thought that only the IS side of Hume's IS/OUGHT divide would be relevant to actual programming.  It turns out that real programmers doing real software development need the OUGHT side too.

Hume's IS/OUGHT Divide

The philosopher David Hume wrote that all statements fall into one of two categories: descriptive statements about "what IS", versus, prescriptive statements about "what OUGHT to be", and one can't judge what ought to be without a clear accurate understanding of what is.

One of the basic tasks of Philosophy is to try to explain and justify one's intuition and gut reactions via a set of explicit logical rules.  The IS side of things worries about the best way to describe and categorize things, and how we can justify that we know what we think we know.  The OUGHT side of things worries about rules guiding "moral" decisions, and which rules apply in which situations, and what are the overriding goals of each rule system. In other words, what is the "right" thing to do.  In both of these categories, it turns out that our intuitions often result in conflicting answers, thus the need to analyze and sort them out (ahem, easier said than done).

The IS statements are the ones that first come to mind when developing a self-driving car. What IS the terrain, the car speed, the distance to the curb, the position the car in the next lane will be in two seconds.  These questions are the kind covered in Artificial Intelligence classes, the ones first needed to be able to drive at all.  The ones that let a system detect potential collisions, and formulate the set of options available to avoid them.

But, IS statements don't describe which of those options is the "right one", the choice it OUGHT to make. It is only after you realize that sometimes there is no purely "good" option, no option that leaves everyone unscathed, that you realize you will have to program the car to decide who to hit! How does the poor programmer decide that?! Luckily, some programmers had the wisdom to call a philosopher for help encoding moral rules rather than blindly using their programmer's intuition.

So, if a self-driving car hits someone, who OUGHT to have responsibility?  The auto maker? The car owner? The car's software developer who programmed its rules?   If a bicycle darts in front of the car, but the action of swerving to avoid an inevitable collision will itself cause a collision with someone else, who OUGHT to be hit?  The "at fault" bike? The more-likely-to-survive but "innocent" car in the next lane? Override the "never cross the double yellow line" rule and swerve into oncoming traffic (potentially resulting in a chain reaction)?

"Moral" Philosophy

When looking at the language describing the scenarios above, we see words like "action", "choice", "responsibility", "cause", "result", "fault", "innocent", "never", and "more likely to survive". These lead to classic concepts in moral philosophy like Action and Agency, Causation, Free Will vs Determinism, Moral Responsibility vs Moral Luck, Desert (i.e. who deserves what) and Legal Punishment, which are intertwined in the following way; we expect those making decisions to be morally/legally responsible for the consequences of their actions, assuming that they were able to make a free choice.

But there are debates about whether the ends justifies the means (Consequentialism) versus a bad deed is a bad deed (Deontological Ethics). There are also debates about what the overarching goals should be; the most good for the most people (Utilitarianism) versus the most deserving (Prioritarianism), or the most freedom (Libertarianism), or the most equality (Egalitarianism), etc, etc.

These are just the tip of the iceberg, but it is worth the study since they provide a language for documenting and explaining your ultimate set of rules as well as making you aware of the many non-trivial scenarios. Lest you programmers think that Philosophy is overkill, take a look at books like "The Pig that Wants to be Eaten" cataloging the many well-known moral paradoxes that result from relying on intuition and gut reactions.

Saturday, February 15, 2014

It's about Time, It's about Space

A 1960s TV series theme song began, "It's about time, it's about space...". Some, from Physics to Philosophy, say it's about both, claiming they are each aspects of a single space-time. Computer systems developers need to consider this as they build GIS applications.

Ontology, being the branch of philosophy concerned with describing "what exists", tackles the topics of Space and Time since they are often used to describe things. An Introduction to Ontology[1] devotes a chapter to each. As usual, things are more complicated than our initial intuition expects, and debate continues about different viewpoints. In a nutshell, the following are discussed:
  • Space is usually defined in terms of "regions"
  • Space is either absolute or relative.
  • Space is either something things "are in", or it is synonymous with the thing itself
    i.e. regions only have properties like size and location, versus, a region itself having the property blue if the stuff in it is blue.
  • Space is either Euclidean or not (i.e. flat or curved)
  • Space is either separate from Time, or parts of the same thing: space-time.
Most programmers today, in the age of Map apps, Geographic Information Systems, and geocoding, take the view that an entity such as a business or address is located at some location. The location ideally could be defined as a collection of regions defined by GPS coordinates. Often, the location is (over)simplified to a single point on a map.

While it is recognized that many problems exist with actual databases of geocoded entities, it is usually assumed that they are in the realm of epistemology rather than ontology. In other words, it is assumed the problem is with "our knowledge" due to inaccuracies in the set of GPS coordinates; not that locations don't actually have a definite set of coordinates.

However, not every entity that takes up space has a well-defined and unchanging mapping to a set of GPS coordinates.  ZIP codes, for example, are not defined in terms of geography but rather as collections of delivery routes. Another example, as shown in the title insurance case study below, is in real estate legal descriptions. In addition to a knowledge problem caused by ambiguous language used in these descriptions, they can also refer to ephemeral landmarks.

While a naive assumption that space is different than time is often made in data model design, entities like ZIP codes and Legal Descriptions require a time dimension to be completely accurate. It turns out that the mapping of zip codes to postal routes changes several times a year.  And landmarks, referred to in property descriptions, can change location and shape over time.

Case Study: TICOR Title Insurance System
OMEX was a startup that was an early pioneer in creating optical disk technology for data storage. It took on a contract to produce a computer system to support TICOR, the largest title insurance company in the U.S.  TICOR itself had the contract to keep backup copies of all the real estate transactions filed with Los Angeles county.  As a part of archiving copies of the documents, it was free to use the information in them, and hence support its business of providing title insurance.

The computer system was to replace using microfilm photos of the documents with optical disk storage of the images.  It would link these images with a structured database of information related to each property. One of the goals of the database was to enable answering basic questions about property locations.

The programmers, having a naive notion of how property boundaries were defined, were surprised to see that a common method is “metes and bounds” which uses plain english descriptions using landmarks. E.G. "beginning with a corner at the intersection of two stone walls near an apple tree on the north side of Muddy Creek road one mile above the junction of Muddy and Indian Creeks, north for 150 rods to the end of the stone wall bordering the road, then northwest along a line to a large standing rock on the corner of the property now or formerly belonging to John Smith, thence west 150 rods to the corner of a barn near a large oak tree, thence south to Muddy Creek road, thence down the side of the creek road to the starting point."

As can be seen, it would be difficult to translate this into a collection of GPS coordinates. But even if you did, you would not be done with the problem.  Like ZIP codes that change over time, the location and shape of creeks, rivers, etc change over time. Lest you think this is a merely theoretical problem, for centuries, States have sued each other over land ownership due to border rivers migrating over time.

Ultimately, the computer system wound up just using unstructured text fields to contain the legal description rather than the more ambitious GIS database they had originally promised.

[1] An Introduction to Ontology, Nikk Effingham, Polity Press, 2013
[2] A River Runs Thru It, How the States Got Their Shapes, History Channel, 2011