Saturday, February 26, 2000

You can't surf the same Web twice

Just as Heraclitus said in approximately 500BC (as quoted by Plato),
"You could not step twice into the same river; for other waters are ever flowing on to you.";
You can not access the web as it existed yesterday, much less as it was a year ago.
There is no time on the web; only a fuzzy "now".

This makes it important for web site designers to be very clear about which portions of their content are static, and which are dynamic.  The API to this database of content (i.e. the web site itself) is the set of URLs published to access that content.  Just as Interfaces cause great problems if they change frequently (or have no precise meaning as to what they are retrieving), URLs must have stability and precision-of-definition too!  If the same content is to be made available for a long duration then its URL should not change, and it should always return the same version of the content.  On the other hand, if content is defined to be the current value of some query (e.g. the current temperature, or the current version of the home page) then its exact same URL should ALWAYS return the current value.

Just as with software engineering, this means that Interfaces need to be separated from Implementation such that the Interface can be stable even when the Implementation changes.  So, URLs should be designed and specified independently of which host machine, programming language, web framework, etc are used to implement the site because those will change over time.

Some web sites have embraced this notion in part by providing "permalinks", but unfortunately, it is often only the current version of the item to which it refers. Earlier drafts, editions, revisions that were published (and cited in magazines, books, other web sites) are not retrievable even with permalinks.

While the WWW is always changing, there should be ways for books to cite unchanging web content such that it can still be retrieved years later. Stable URLs are the solution.


POSTSCRIPT - Jan 3th, 2009
Off and on over the history of the web, many have advocated giant internet archival sites to solve the problem of stable URLs to frozen data. It can be seen in retrospect that this has not worked because:
a) Many archival sites have died, collapsing under the sheer weight of data
b) without a stable URL scheme implemented by each web site, one still doesn't know how to get to the latest version of a web page versus the version that existed 3 years ago.


POSTSCRIPT - May 12th, 2009
There is a new book out by Dan Bricklin that republishes his 10 years worth of blogs. He talks about this very problem in his book and this interview.  He praises archive.org which is the new location of the archives that search engines once linked to directly on their search results.


POSTSCRIPT - April 20th, 2010
There is a new program on ITC Conversations that talks about yet another new proposal to fix the problem of the web having no time dimension...i.e. no way to say "show me cnn.com as it looked 6 months ago". See http://itc.conversationsnetwork.org/shows/detail4456.html