JavaScript: shuffling by sorting : Algorithms for the masses

JavaScript: shuffling by sorting

I’ll admit this one is really wacky, so sit yourself down and read through this slowly. It’s OK if you need to take a break for a breather and to clear your mind. I had to the first time I came across this little, er, hack.

Shuffling cards The situation is this: you have an array and you need to shuffle the items in the array. Now, being the Knuth fan that I am, I would immediately reach for Volume 3 of The Art of Computer Programming, check what Knuth has to say about it, and code it up:

var shuffle = function (a) {
  var randomIndex, temp, i = a.length;
  while (--i) {
    randomIndex = Math.floor(i * Math.random());
    if (randomIndex !== i) {
      temp = a[i];
      a[i] = a[randomIndex];
      a[randomIndex] = temp;
    }
  }
};

In essence (and ignoring some error checking): counting down from the top of the array, select a random element in what’s left of the array, and swap over the elements. Knuth states and proves why this is the best way of shuffling an array.

Imagine my surprise when, during some surfing session one day, I came across essentially this version of a shuffle:

var shuffle = function (a) {
  a.sort(function () {
    return Math.random() - 0.5;
  });
};

Using the sort function to shuffle? Shoot me now.

Let’s look at this closely. The sort function can take a function as its first parameter. This function is supposed to take two parameters, a and b, and return a numeric value. If this is less than zero then a is sorted to a lower index than b (if you like, in some sense, a < b). If it returns a number greater than zero, the opposite occurs and b is sorted to an index lower than a. If equal, the elements are not moved with respect to each other (well, in theory, but in practice this tends not to happen; in essence, obeying this rule is the difference between a stable and an unstable sort). If this comparison function is missing, the sort function converts each pair of elements to strings and does a lexical comparison between them. This leads to weird effects like this:

var a = [1, 2, 4, 3, 7, 8, 60, 5];
a.sort();
console.log(a) // prints [1, 2, 3, 4, 5, 60, 7, 8]

…because the string “60” is lexically less than the string “7”. Ouch. BEWARE!

Anyway, what our funky comparison function is doing is returning a random number between –0.5 and 0.5. Half the time it’ll return a number less than zero (so the pair of elements are sorted in the “less than” sense) and half the time a number greater than zero (so the pair of elements are sorted “greater than”). And I would have to say it seems to work, although it scares the heck out of me.

Why? Well, for a start the browser’s implementation of sort is in all probability done with quicksort. Which particular flavor of quicksort is unknown (the different flavors would be how the pivot element is chosen). Not matter how it’s done exactly, one of the overriding assumptions being made in the implementation is that the comparison function does not lie. If a is less than b, the comparison will return less than zero, every single time. If it does lie, neither you nor I have any idea how it would behave.

Quick story to illustrate my misgivings. About 15 years ago, perhaps more, I used to be in charge of supporting a Delphi library called B-Tree Filer. It had a procedure that sorted a virtual array using a merge sort implementation. One of the parameters was called LessThan, and it was supposed to be a function taking two parameters and returning true if the first was less than the second, false otherwise. All of my support issues dealing with this procedure (“Hey, it’s not sorting properly. Your code sucks!”) were easily solved when I explained that “less than” did not mean “less than or equal”. Since then I’ve been ultra careful when dealing with sorts and quicksort is just one of those temperamental algorithms you handle with kid gloves.

So, my first point is this: shuffling via sorting may work in this browser and previous versions, but it’s not necessarily going to work in the next. You may be bitten and never realize it.

The second point to make is that fast sorts are O(n × log n), so so shuffling via sorting will also have this performance characteristic. Sounds fast enough, but the proper shuffle algorithm I presented above is O(n). It’s faster for bigger arrays (small arrays are too small to make a difference). Why use a slower algorithm when the correct algorithm is faster?

Now playing:
New Order - 60 Miles an Hour
(from Get Ready)

Thu 25-Aug-2011 6:55 PM JavaScriptLessons / tags: shuffle sort

Loading links to posts on similar topics...

previous post next post

3 Responses

#1 Marcel Popescu said...

26-Aug-11 1:58 AM

... because the correct algorithm is SLOWER... in programmer time, which is what counts most. The "sort randomly" algorithm can be remembered much easier than the other one.

I'm not saying you SHOULD choose the simpler algorithm, by the way; I'm just providing a possible answer to your question :)

#2 Ryan said...

26-Aug-11 8:40 AM

@Marcel Any decent programmer should know that reinventing the wheel when it comes to sorting, shuffling, searching, or any other core CS concept is wasting their time. There are dozens, if not hundreds, of websites and books where you can look up algorithms.

Special care is needed in weakly typed languages like Javascript for the reasons Julian pointed out.

#3 julian m bucknall said...

26-Jun-14 6:59 PM

All: OK, I just found an extremely interesting take on the "shuffle-by-sorting" algorithm where it's shown that -- essentially -- it produces a crappy random shuffle. Check it out here.

The correct one (as in, it produces a better random shuffle) is the first one I mention in this article.

Cheers, Julian

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

Emphasize with italics: surround word with underscores _emphasis_
Emphasize strongly: surround word with double-asterisks **strong**
Link: surround text with square brackets, url with parentheses [text](url)
Inline code: surround text with backticks `IEnumerable`
Unordered list: start each line with an asterisk, space * an item
Ordered list: start each line with a digit, period, space 1. an item
Insert code block: start each line with four spaces
Insert blockquote: start each line with right-angle-bracket, space > Now is the time...

by Julian M Bucknall