C# Algorithms Book Review

published: Sat, 23-Feb-2008   |   updated: Thu, 28-Feb-2008
Book

This week Amazon delivered to me two new algorithms and data structures books. The first, Purely Functional Data Structures by Chris Okasaki, is going to take some in-depth perusing, because I'm still not very familiar with functional programming (I'm following Dustin Campbell's series on F# avidly, though). But the second, Data Structures and Algorithms Using C# by Michael McMillan is a different kettle of fish: it's in C#, which I know very well — along with the .NET Framework — and it's about some very familiar data structures and algorithms.

Before going any further with this review of the second book, let me expose my biases. Nearly seven years ago, I wrote a data structures and algorithms book for Delphi. I sweated blood and tears over it, being determined to make it as accurate as possible. I also had this dream of rewriting it for C#, but the memories of completing my first book are still too fresh. So although I haven't written such a book yet, nor am I writing one, nevertheless I may still do so. Bear that in mind.

A friend (Michael Covington) of a friend (Jeff Duntemann) posted this in his blog: "let me recommend the book Data Structures and Algorithms Using C#, by Michael McMillan", and added a 5-star review on Amazon as well. So, I made a note to buy it (I love buying algorithm books), and eventually I did. I don't know which version Mr. Covington got, but the version I got was appalling. I certainly cannot recommend it in any shape or form, at least not without a very thorough tech edit and rewrite (it is somewhat obvious that it's had none).

Basics first: paperback, about 350 pages; over 17 chapters (~20 pages a chapter, so you already know it's not going to present topics in any depth); references section and index; big type (12-point?); more screenshots than diagrams; code is presented bare without any syntax highlighting; no CD. I searched fruitlessly in the book and online for a URL to download the source code: nothing. I searched the text and online for the author's website (errata, anyone?): none. Cambridge University Press had no links to anything apart from a buy now button. I did find his email address eventually.

Last night, after writing my latest PCPlus article (on Bézier curves, if you want to know), I sat down to a take-out combo chow mein and started perusing the book.

I skipped through the introduction pretty quickly. First sentence: "The study of data structures and algorithms is critical to the development of the professional programmer." OK, I can agree with that, though there is a lot more that the professional programmer should study as well. The prerequisites talked about having some familiarity with C#.

One of my most favorite algorithm topics is that of sorting, so naturally I turned to chapter 3 (Basic Sorting Algorithms) next. Big mistake. In this chapter, Mr. McMillan talks about three sorting algorithms: bubble, selection, and insertion sorts. Quick introduction, waffling about the importance of sorting, then we got into a "test bed" for the sort implementations and the first bit of code.

He declares a class called CArray (p.43). What the heck is that? Why not SortTestBed? Name things according to their function is an important lesson for the professional programmer to learn. In essence, this class is a wrapper around a private int[] array, called, er, arr. Please, no pirate jokes. Here's the code.

internal class CArray {
  private int[] arr;
  private int upper;
  private int numElements;

  public CArray(int size) {
    arr = new int[size];
    upper = size - 1;
    numElements = 0;
  }
  public void Insert(int item) {
    arr[numElements] = item;
    numElements++;
  }
  public void DisplayElements() {
    for (int i = 0; i <= upper; i++)
      Console.Write(arr[i] + " ");
  }
  public void Clear() {
    for (int i = 0; i <= upper; i++)
      arr[i] = 0;
    numElements = 0;
  }
}

(By the way, thanks to Mark Miller and Developer Express for producing CodeRush, otherwise this post would have turned out just like those early days of getting a Sinclair ZX81 mag and laboriously typing the latest snazzy program in.)

Looking at this code, god only knows why numElements was declared, since it's only read in Insert(), otherwise it's only written. The code always assumes that you insert the right number of items in the array. There's no bounds-checking, the use of the internal array is primitive, and the for loops are just not idiomatic C# (it'd be easier to read with arr.Length). And where's foreach when you need it?

Then the fun bit. Here's the first bit of code (p.44) that creates a new CArray:

CArray nums = new CArray();

It won't compile of course, since the CArray constructor requires a size parameter. Right away, I'm left with the feeling that Mr. McMillan doesn't know how to program in C# very well (or at the minimum doesn't know how to copy-and-paste working code into a document), and isn't familiar with professional coding practices.

The next bit of the text then talks about filling an instance of the CArray class with random values, in order that we can show that the various sorting routines work. Now, I would make this a method of the class, you know, FillWithRandomData() or something. No, Mr. McMillan writes it as part of his Main method.

He says this in the text (p.44):

To instantiate a Random object, you have to pass a seed to the class constructor. This seed can be seen as an upper bound for the range of numbers the random number generator can create.

Two sentences with an error apiece. First of all, you don't have to pass a seed to construct an object of type Random since there's another constructor available which doesn't require it. And the second sentence is completely baffling: the seed has nothing whatsoever to do with an "upper bound", it merely defines the starting point in a stream of random numbers. A different seed gives a different starting point. Think of the random number generator as a very, very large table of random numbers, huge, massive, with a gazillion entries; all the seed does is point at an entry and says "start here".

Now it gets a little surreal. He discusses bubble sort and shows the code (p.46). I'll ignore the typecast that casts an element of the array to an int (it's not necessary), but merely note that the code for the bubble sort method is written as a member of the CArray class. This is so bizarre that it actually took me several goes of reading the code before I twigged what was going on. I honestly thought that he'd forgotten to pass the int[] array to be sorted as a parameter of the method. I really did.

public void BubbleSort() {
  int temp;
  for (int outer = upper; outer >= 1; outer--) {
    for (int inner = 0; inner <= outer - 1; inner++) {
      if ((int)arr[inner] > arr[inner + 1]) {
        temp = arr[inner];
        arr[inner] = arr[inner + 1];
        arr[inner + 1] = temp;
      }
    }
  }
}

Having presented his first sort implementation, I was expecting a quick aside on testing whether the code works or not. Well, almost. Instead, I guess the programmer is expected to print out all the numbers to the console and visually check, since the next section is about, er, displaying the values in the array on the console.

Oh, and the simple optimization to add to bubble sort to shortcut execution once the array is sorted is not shown here.

Next up, selection sort (p.48). The text says: "The outer loop moves from the first element in the array to the next to last element" (which is correct) and disappointingly the code does something different:

public void SelectionSort() {
  int min, temp;
  // should be "outer < upper" in this for loop
  for (int outer = 0; outer <= upper; outer++) {
    min = outer;
    for (int inner = outer + 1; inner <= upper; inner++) 
      if (arr[inner] < arr[min])
        min = inner;
    temp = arr[outer];
    arr[outer] = arr[min];
    arr[min] = temp;
  }
}

Again, Mr. McMillan is just not familiar enough with C# idiom, especially the for loop.

Finally we are introduced to insertion sort (p.50) with a typographic bug (it seems to use the Header4 style instead of Header3). The code presented here has a subtle performance bug.

public void InsertionSort() {
  int inner, temp;
  for (int outer = 1; outer <= upper; outer++) {
    temp = arr[outer];
    inner = outer;
    while ((inner > 0) && (temp <= arr[inner - 1])) {
      arr[inner] = arr[inner - 1];
      inner -= 1;
    }
    arr[inner] = temp;
  }
}

Bizarrely, he uses the -= operator here instead of --.

Now we get to the fun part. Mr. McMillan, although eschewing any discussion of the big-Oh notation early on in the introductory chapter ("there is [...] not one mention of Big Oh analysis [in this book]"), decides to time the sorts but doesn't draw any conclusions from this timing (apart from the rather high-schoolish "X seems to be faster than Y", and "if you have more things to sort, it takes longer"), even though the data is screaming out for some simple analysis ("hey, wow, look at this: if we sort 10 times the number of items, it seems to take 100 times longer!"). But not here, since that would introduce the O(n2) notation or something and scare away the reader.

Despite this oddity, the weirdest thing about this timing section is the set of results. Baldly stated, selection sort was fastest by an order of magnitude, and insertion sort was slowest. Funnily enough, in his timing program, selection sort was timed first and insertion sort last. My own tests seven years ago showed that even unoptimized insertion sort (yes, there is a simple optimization that can be made to insertion sort to make its inner loop faster, but you won't find it in this book) was by far the fastest of the O(n2) sorts (oops, sorry, I hope I didn't scare you there). So what the heck was going on?

It was at this point that I wished for the source code. Please. Pretty please. Just so I could load it up in Visual Studio and run it for myself. The results were just so out-of-whack that there had to be a bug in his program. Unfortunately, we'll never know since we don't have the code that produced the screenshots, only the code that was pasted into the book's document.

Instead I started this exercise of typing in the code just so I could find out for myself. In doing so, I found the performance bug in his implementation of insertion sort (and proved to myself that it wasn't so bad as to produce his results). And of course, I also proved that insertion sort is nearly twice as fast than selection sort in this particular scenario (and selection sort was faster than bubble sort), even with his implementations.

So, there must have been a bad bug in the program he used for the screenshots. An awful bug. One that he made no attempt to find (anyone who teaches or has learnt algorithms knows that bubble sort is always the slowest), but instead printed the results of it for all time.

This one chapter that I've reviewed in depth, to the point of typing all the code into Visual Studio so that I could run and test it, is no bearer of good news about the rest of the book. The chapter is 13 pages long and there are errors of fact, buggy code listings, no attempts to draw meaningful conclusions or to optimize, several examples of inferior OO design and programming, no good examples of testing (I'm sorry, a "visual check" doesn't equal "testing"), and many instances of non-idiomatic C#.

So just on the basis of this one review of just one chapter I can guarantee that this book is dreadful. Do not buy it. Since there is no other book on algorithms in C#, I would even say buy a similar book in Java, you'll probably find it better than this one, even though you have to convert the code (and that's not too difficult from Java to C#).

Oh, the perf bug in his insertion sort? The inner loop that moves larger items to the right in the array (if you view it horizontally) moves items equal to the search item as well. There is no need to do that: all you have to do is find is the first item that is not larger than the one you're trying to insert. Take away the equals symbol in the check. The bug is magnified because of the way Mr. McMillan is initializing the array with random numbers. He only uses the values 0 to 99, so there will be lots of duplicates — on average 100 of each for 10000 elements. So, in this case, on average each item to be inserted moves 50 items in the array too many.

[This is part 1 of a series of posts on this book. Part 2. Part 3. Part 4. Enjoy.]