Tuesday, August 11, 2009

Faces with je ne sais quoi

A couple of years ago, I mused on the (possibly fatal) limitations of having Rationalism as the fundamental basis of the Semantic Web. Recently in the Communications of the ACM, an article added another example to the list of things that can be "known" but can't reliably be put into words. In Face Recognition Breakthough[1], a new technique was presented that performed significantly faster and with more accuracy than traditional face-recognition techniques. The basis of the technique was so surprising and counter-intuitive that, as recently as 2007, papers about it were rejected by mainstream computer vision conferences.

In very simplistic terms, the new technique finds the most compact way to represent the pixels of a picture (of say a face) by throwing them at a random set of numbers and seeing what sticks. That compressed data is compared directly with compressed versions of other face pictures to find the closest match. (If you really want the gory math details, see this video lecture.[2]) What is DOESN'T do is all the traditional figuring out of where eyes and mouth and nose and ears are, and calculating the relationships between their locations, distances, etc. In other words, it doesn't work by analyzing a face into words/concepts (eye, nose, mouth, etc) and specifying relationships between them. It DOES do weird math using random numbers that is irrational in the literal sense of the word. And apparently this weird math not only works, it works better!

The 2007 rejection of the papers as presenting outlandish claims was based on the same bias as rationalism has; if you can't put something into words, much less rational arguments, its not true, and its not knowledge. Just as neural-nets do, the mechanics of sparse representation and compressed sensing encode "knowledge" in a form that is completely unintelligible to us humans when we look at the "raw data". And while techniques that DO use more human-reason-friendly ideas are available, they often don't work as well.

[1] http://portal.acm.org/citation.cfm?id=1536616.1536623
[2] http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/15994/lecture.htm
Note: use IE browser; also you can skip forward past 46 minutes of theory to go directly to applications.