A data example of this problem is the one where lots of individual names/addresses need to be clustered into identities even though there is variation in the various names/addresses. There are cases where it is ambiguous which identity "owns" a particular name/address when the fuzzy blob of one identity cluster overlaps the fuzzy blob of another identity. How to tell which one it belongs with? Why do we even think that there are two overlapping blobs instead of just one oddly shaped blob?
AHA - Look at velocity!
The problem of determining which points belong to which overlapping fuzzy regions is hard when looking at a static picture, however it is easy when there is movement. When looking at which stars belong to which of two colliding galaxies, we look at the velocity of the star to see which galaxy it is moving with.
So, can this be applied to data? Is there some "velocity" that can be determined for each data point such that it can be associated with the "proper" data cluster? Is there a velocity associated with a name/address instance?
[1] "I am a Strange Loop",2007, Hofstadter, Basic Books
No comments:
Post a Comment