Monday, April 16, 2007

What is the difference between an Ontology and a Taxonomy

The following has been posted on a few sites and I don't know which was the original. In any event, it is very germane to my purpose, so I republish it here...

A controlled vocabulary is a list of terms that have been enumerated explicitly. This list is controlled by and is available from a controlled vocabulary registration authority. All terms in a controlled vocabulary should have an unambiguous, non-redundant definition. This is a design goal that may not be true in practice. It depends on how strict the controlled vocabulary registration authority is regarding registration of terms into a controlled vocabulary. At a minimum, the following two rules should be enforced:

  1. If the same term is commonly used to mean different concepts in different contexts, then its name is explicitly qualified to resolve this ambiguity.
  2. If multiple terms are used to mean the same thing, one of the terms is identified as the preferred term in the controlled vocabulary and the other terms are listed as synonyms or aliases.
A taxonomy is a collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more parent-child relationships to other terms in the taxonomy. There may be different types of parent-child relationships in a taxonomy (e.g., whole-part, genus-species, type-instance), but good practice limits all parent-child relationships to a single parent to be of the same type. Some taxonomies allow poly-hierarchy, which means that a term can have multiple parents. This means that if a term appears in multiple places in a taxonomy, then it is the same term. Specifically, if a term has children in one place in a taxonomy, then it has the same children in every other place where it appears.

A thesaurus is a networked collection of controlled vocabulary terms. This means that a thesaurus uses associative relationships in addition to parent-child relationships. The expressiveness of the associative relationships in a thesaurus vary and can be as simple as "related to term" as in term A is related to term B.

People use the word ontology to mean different things, e.g. glossaries & data dictionaries, thesauri & taxonomies, schemas & data models, and formal ontologies & inference. A formal ontology is a controlled vocabulary expressed in an ontology representation language. This language has a grammar for using vocabulary terms to express something meaningful within a specified domain of interest. The grammar contains formal constraints (e.g., specifies what it means to be a well-formed statement, assertion, query, etc.) on how terms in the ontology's controlled vocabulary can be used together.

People make commitments to use a specific controlled vocabulary or ontology for a domain of interest. Enforcement of an ontology's grammar may be rigorous or lax. Frequently, the grammar for a "light-weight" ontology is not completely specified, i.e., it has implicit rules that are not explicitly documented.

A meta-model is an explicit model of the constructs and rules needed to build specific models within a domain of interest. A valid meta-model is an ontology, but not all ontologies are modeled explicitly as meta-models. A meta-model can be viewed from three different perspectives:

  1. as a set of building blocks and rules used to build models
  2. as a model of a domain of interest, and
  3. as an instance of another model.
When comparing meta-models to ontologies, we are talking about meta-models as models (perspective 2).

Note: Meta-modeling as a domain of interest can have its own ontology. For example, the CDIF Family of Standards, which contains the CDIF Meta-meta-model along with rules for modeling and extensibility and transfer format, is such an ontology. When modelers use a modeling tool to construct models, they are making a commitment to use the ontology implemented in the modeling tool. This model making ontology is usually called a meta-model, with "model making" as its domain of interest.

Bottom line: Taxonomies and Thesauri may relate terms in a controlled vocabulary via parent-child and associative relationships, but do not contain explicit grammar rules to constrain how to use controlled vocabulary terms to express (model) something meaningful within a domain of interest. A meta-model is an ontology used by modelers. People make commitments to use a specific controlled vocabulary or ontology for a domain of interest.

Friday, April 13, 2007

Don't confuse (OOP) Objects with Entities

AHA! (OOP language) Classes are too general to think of as synonymous with (Philosophical) Entities! OOP classes/objects can represent roles, or values of properties (e.g. Red), or PropertyTypes (e.g. Color), NONE OF WHICH are Entities. SO, the "isA" relationship between a superClass and its subClasses is not always analogous with the "isA" relationship between an Entity and the collection of Entities constituting an entityType.

"Red isA Color" is different than "Joe isA Human" or "Human isA Mammal".
 Red is not an entity, Joe is and a Human is. Why? Red is not countable and not distinguishable from another "instance" of red; Joe, humans, and mammals are. [Ed. note: this intuition will later be recognized as the problem of universals as the author reads more books. ;-) ]

With "domain objects" (aka "business objects") being roughly equivalent to philosophical entities, it is dawning on the IT industry (e.g. POJOs, "domain driven development") that there are at least 2 kinds of classes/objects: "domain" and "non-domain".  There are wildly different ideas of what "non-domain" classes are, but they generally are meant to include all that programming implementation logic kind of stuff rather than the direct modeling of the world.

The main point here (and the AHA moment), is recognizing that lots of confusion about how to divide a problem into programming language classes (along with what logic goes into their "equals" methods) boils down to a lack of understanding about first philosophy concepts of entities, universals, etc.  We programmers confuse the ability of programming languages to implement everything as a class/object with the idea that down deep everything is the same kind of thing.
Philosophy would say that red is fundamentally different than Fred, and both are fundamentally different than a "process" (i.e. an OO method) or an "event" (i.e. an OO "message").

Thursday, April 12, 2007

"isA" and "asA" Relationships

In modeling the world, (object-oriented, entity-relationship, etc), the emphasis has been on distinguishing between "is-a" relationships and "has-a" relationships. (human isA mammal, human hasA head) There is another fundamental relationship that is under-emphasized in modeling; namely, the "as-a" relationship. This is the relationship between an entity and a "role" that that entity can take on. Many putative "entities" (e.g. customer, employee, etc) are not really entities at all, but are "roles" that the actual entity (e.g. a person) can take on. [see a case study here]

Roles are often implemented as classes, and multiple-inheritance is used [or worse, lots of glue code is written] to gain the lexical convenience of referencing joe.employeeID and joe.resign() versus joe.employeeRole.getID() and joe.employeeRole.resign(). Of course, "static" classes can be used to encapsulate the role details resulting in AsEmployee(joe).resign() references.