published: Thu, 6-May-2004 | updated: Mon, 16-May-2005
Continuing my trek through the Google search phrases that are used to access my web site, today I'm going to discuss "simple shuffle algorithm," a phrase that was used a few days ago.
Shuffling is the antithesis of sorting. With shuffling the idea is to disorganize a set of objects, compared with the objective of sorting, which is to organize them in order.
Most people, when asked to shuffle the items in an array (say), come up with the following algorithm. Loop through the items in the array. For each item, generate a random index into the array, and then swap the current item with this random item. (For those not familiar with TList in the Delphi 7 code that follows, it's a zero-based array-like class that stores pointers.)
procedure SimpleListShuffle(aList : TList); var Inx : integer; RandomInx : integer; TempPtr : pointer; begin for Inx := 0 to aList.Count - 1 do begin RandomInx := Random(aList.Count); if (RandomInx <> Inx) then begin TempPtr := aList[Inx]; aList[Inx] := aList[RandomInx]; aList[RandomInx] := TempPtr; end; end; end;
If you analyze this algorithm, every item could be swapped with any other. Items that have already been swapped may be swapped again, and so on. If there are n items, you could potentially generate n^n different shuffles. Unfortunately, there are only n! possible different shuffles, and n^n > n! for n > 1, so we must be generating certain shuffles many times. (See here for a simple discussion of this argument. In this article, Listing 5.2 is equivalent to the first code snippet above, and Listing 5.3 is equivalent to the one below.)
The fault in this algorithm is simple. Instead of generating a random index into the whole array, we should generate an index into the part of the array that hasn't been visited yet. So for the first item we'll generate an index from 0 to n - 1, i.e. n possible values; for the second item we'll generate an index from 1 to n - 1, i.e. n - 2 possible items; and so on. Overall we'll generate 1 out of n! possibl;e shuffles (however, see note below).
I traditionally code this in reverse order, from the end of the array to the start: it makes the math a smidgeon easier.
procedure ListShuffle(aList : TList); var Inx : integer; RandomInx : integer; TempPtr : pointer; begin for Inx := aList.Count - 1 downto 1 do begin RandomInx := Random(Inx + 1); if (RandomInx <> Inx) then begin TempPtr := aList[Inx]; aList[Inx] := aList[RandomInx]; aList[RandomInx] := TempPtr; end; end; end;
With that out of the way, let's analyze the situation where shuffling is most often used: shuffling a virtual deck of cards for a card game. Here there are 52 items to shuffle and the number of possible shuffles is therefore 52!, which is about 8.1 * 10^67. With Delphi's traditional random number generator (which is a linear congruential generator), the seed to initialize the generator and the cycle length are intimately linked. The reason is that the generator generates the next random number based on the current value of the seed and saves the new value of the seed immediately afterwards. There are 4.3 * 10^9 possible seeds (it's a full 32-bit value), which means only 4.3 * 10^9 of these 8.1 * 10^67 possible shuffles could be generated.
In some situations, this isn't too bad. For example, I play a very good implementation of Solitaire on my Sony Clié, one that allows you to set the initial seed (it's a linear congruential generator as well), but I doubt that I'll ever exhaust all four billion possible games in my lifetime!
In .NET and the CLR, the random number generator is different. It's an additive generator, described by Knuth in The Art of Computer Programming, Volume 2: Seminumerical Algorithms (although it seems that the CLR team implemented the one in Numerical Recipes by Press et al. It has a very long cycle, this time about 3 * 10^25, and the seed is a 31-bit value (it's actually an int, but the absolute value is taken before it's used). The seed is used to populate a 55 item array of integers, from which items are "combined" to form random numbers. The nice thing is that each starting seed will produce a different cycle of random numbers so, given enough time, it is possible to generate about 6 * 10^34 shuffles, or about half of the total. The bad thing about it is that you can only populate the internal array through the seed value; there's no way to populate it through other meana.
An interesting issue occurs in on-line card games. There was a celebrated case where an on-line poker game used the standard Delphi Random routine to generate the shuffle. To aid in randomizing the shuffle, at the beginning of every deal, the generator was initialized with the Randomize procedure, which takes the time from midnight in milliseconds as the seed for the generator.
Now, consider this: there are 86,400,000 milliseconds in a day, so the game operated with only 86 million possible seeds and hence only 86 million possible shuffles. Given that it was possible to know what time the on-line game's computer was set at, you could reduce that number even further to maybe 120,000 possible seeds (allowing for 1 minute's worth of error either way). From that it's remarkably easy to calculate all 120,000 possible deals and to match one with the cards that are dealt that you can see. And from that, you knew everyone's hand.