PCPlus 272: Generating gobbledygook

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a few months. When I buy the current issue, I'll publish the article from the issue a year ago. Since I've now got September's issue (and have had it for a couple of weeks), here's September 2008's article.

PCPlus logo Pure fun this time: generating random text. The article shows that generating pure random text where every character has an equal probability of appearing next doesn't work particularly well. Enter Markov chains, where the probabilities of what comes next are skewed to what has just appeared. First we look at characters, so the next character depends on what the previous character was (order-1 Markov chain), all the way up to an order-10 Markov chain (the next character depends on the previous 10 characters). I particularly like the example text generated from War of the Worlds for this latter case:

BOOK ONE THE EVE OF THE WAR

No one would have left an abiding sense of smell, but it had a pair of very large dark eyes of a Martian from the Martians making their blue shirts, dark trousers, and singers.

I just love the idea of the Martians making their blue shirts and dark trousers.

Anyway, I also experimented with Markov chains that use previous words instead of characters, but in reality an order-10 Markov chain based on characters would work very well.

This article first appeared in issue 272, September 2008.

You can download the PDF here.

Now playing:
Art of Noise - Beat Box (Diversion One)
(from The Best of the Art of Noise)

Loading similar posts...   Loading links to posts on similar topics...

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response