NiftyNews

From brainsik
Jump to navigation Jump to search

Word burstiness (Wed, 23 Apr 2003 1pm MDT)

Announcement article [1] and sample results [2] on an algorithm developed by Kleinberg that discovers word bursts in text streams. Of importance is that the temporal order of the streams is used to discover when words suddenly start to be used a lot. He ran the algorithm over all of the state of the union addresses and the computer was able to discover many of the important topics of the times. (british, slavery, depression, atomic, vietnam, ...)

  1. http://www.news.cornell.edu/releases/Feb03/AAAS.Kleinberg.bursty.ws.html
  2. http://www.cs.cornell.edu/home/kleinber/kdd02.html

I think this work is cool because the algorithm appears highly focused on the temporal nature of the documents. It seems most of the searches we do are mostly or entirely spatially based. [I define temporal and spatial below.] Google, which does weight sites by how often they update, seems to mostly base its search results on the spatial structure of the web. For some searches, a more temporally focused result would probably be better. For example, if you are interested in finding the newest ideas as opposed to the most popular ideas you might need to care more about the temporal relationship of documents rather their spatial relationships.

What do I mean by temporal and spatial? By temporal, I mean it is important when a word or document appears in time. By spatial, I imagine a graph of relationships connecting documents. For example, a graph could be made by drawing lines between documents that hyperlink one another, or by connecting together papers citing one another. These relationships I consider "spatial" since they may form multi-dimensional spaces and you can measure quantities like distance.

If anyone has other definitions, I'd like to hear them.


Last Edit: Wed, 23 Apr 2003 12:06:09 -0700
Revisions: 2