PCPlus 277: Dictionaries and hash tables

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoJanuary 2009 was an important change for me. It seems that the Editor was pleased enough with my pieces (and I presume so were the readers) that my commission each month expanded to three pages instead of the prior two (or, if you’re counting, 2000 words instead of 1300). Not quite a 50% increase in pay to go along with the 50% increase in surface area — ha! — but to be quite honest I didn’t particularly care, and still don’t: I really enjoy writing for them and the money they pay is pretty good anyway. The change in word count meant that I could start to do my topics in more depth. Before I would sometimes be struggling to contain the topic in the space, but now the extra room allowed me to cover more detail.

The first article in this new expanded section had to be a good one. I decided to cover one of my favourite data structures: the hash table or dictionary.

Even with the extra space, all I could cover was the basic hash table together with linear probing as a collision resolution mechanism (and example of open addressing) and the problems of clustering. Hash tables with open addressing is still one of my favourite ways of implementing a dictionary, and so writing the article was fairly quick.

I liked doing the figures too, although Figure 2 is a bit bizarre without some explanation (time is meant to be read from top to bottom, first we insert a record whose hash resolves to index 2, then one at 17, then one at 11, etc; the more we add records, the more collisions there are and they tend to cluster). The figure cries out for animation, as I discovered recently when I slipped in a couple of images of hash tables to my CTO video on seeing things in black and white. There I wanted to show how disruptive it can be when a hash table grows, and it too needed animation, especially in a video like that. I’ve since found out that Illustrator can produce Flash animations from a series of layers, etc, but I haven’t had a chance to play around with that as yet.

Double-take: a hash table “disruptive”? Indeed, yes, under certain situations. Much is made of the fact that insertion and search in a hash table is O(1), that is, it’s constant time whether there are 10 items in the table or 10,000. Not strictly true, as it happens, it’s more of an amortized O(1) over many insertions or searches. The reason is that, during an insertion, there may be a point when the load factor is too high (as explained in the article, for open addressing that’s assumed to be about 2/3 full), and the hash table has to be grown. This requires allocating a new array (traditionally it’s set to twice the size of the original), and then rehashing and inserting all of the current items into the new array. This is an O(n) operation and it happens every n items, so, amortized, it smears out to a constant addition factor over all n items. Whoopee for the big-Oh notation, but in practice, if you have a hash table containing a huge number of items, the time taken for this growth may be noticeable by the user or by a real-time process that’s not expecting it.

This article first appeared in issue 277, January 2009.

You can download the PDF here.

(Quick aside: PCPlus used to put part of their archive as PDFs on the DVD in the back of the magazine. They’ve now moved to a CD instead of a DVD, presumably to save on costs, and the archive is no longer on there. I hear they’re going to publish it online instead, sometime in the near future.)

Loading similar posts...   Loading links to posts on similar topics...

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response