PCPlus 274: Choosing random samples : Algorithms for the masses

PCPlus 274: Choosing random samples

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a few months. When I buy the current issue, I'll publish the article from the issue a year ago. I bought November's issue this lunchtime, so here's November 2008's article.

The premise of this article is pretty simple: for statistical purposes you have to select, at random, n items from a very large set of them. Then, presumably, you will make some deductions about the whole set from this much smaller sample. There are many applications of this and similar algorithms; the most obvious being political polling.

It's instructive to think about the issue before reading the article. Say you have to select exactly three items at random from a set of 10. How would you go about it such that you do not skew the selection? That is, such that every item has an equal probability of being selected? (An example I give in the article is to calculate a random number between 0 and 1 for every item in the set: if it's less than 0.3 select the item, stop when you have three. However, this is very biased to the earlier items, and, indeed, you might not even get three items at all.)

If you manage that, consider the scenario where you don't know the total number of items at the outset, yet you must select 1000 of them such that each has an equal probability of being selected. Totally at random. Without counting them.

Fascinating stuff.

This article first appeared in issue 274, November 2008.

You can download the PDF here.

Now playing:
The Alan Parsons Project - Eye In The Sky
(from The Definitive Collection)

Wed 28-Oct-2009 6:35 PM PCPlus / tags: sampling polling

Loading links to posts on similar topics...

previous post next post

No Responses

Feel free to add a comment...

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

Emphasize with italics: surround word with underscores _emphasis_
Emphasize strongly: surround word with double-asterisks **strong**
Link: surround text with square brackets, url with parentheses [text](url)
Inline code: surround text with backticks `IEnumerable`
Unordered list: start each line with an asterisk, space * an item
Ordered list: start each line with a digit, period, space 1. an item
Insert code block: start each line with four spaces
Insert blockquote: start each line with right-angle-bracket, space > Now is the time...

by Julian M Bucknall