I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.
This particular piece was a pure layman’s article about how to index text and in particular how big search engines index web pages. I covered the usual suspects: inverted indexes and PageRank, with asides on stemming and SEO (search engine optimization).
As it happens, in doing the research for this article, I read Sergey Brin & Larry Page’s seminal paper The Anatomy of a Large-Scale Hypertextual Web Search Engine for the first time. This was the paper that essentially launched Google and that changed the landscape of search engines. The techniques discussed in this paper have obviously improved in the 12 years since then (I dare say that Google no longer just uses PageRank but instead use a panoply of different indexing mechanisms to improve results), but it is still an excellent exposition of what happens in a large-scale search engine.
And... 12 years ago? How the internet has changed since Brin and Page presented their paper at the Seventh International World-Wide Web Conference in 1998.
This article first appeared in issue 281, May 2009.
You can download the PDF here.
Massive Attack - Babel