julian m bucknall >> A little linked list trick

A little linked list trick

published: Tue, 5-Dec-2006 | updated: Tue, 5-Dec-2006

Let's have a little bit of fun today and talk about a minor yet elegant algorithmic change to a singly-linked list to make it a teensy bit more functional, and in doing so talk a bit about generics. This is certainly not earth-shattering stuff, but merely an enjoyable diversion from the horrors of your daily development. I'll be using C# 2.0 to implement this algorithmic amuse-bouche, so fire up your Visual Studios.

It'll also be uplifting for me to write this because a newsgroup post a couple of weeks ago from someone I've never met or corresponded with cast doubt on my algorithmic abilities (essentially "all he's ever done is write a book on algorithms"). All righty, then...

I'm sure my readers know all about a singly-linked list. It's been done to death, right? You learned this stuff at your mother's knee and there's just nothing new to say about it, so why an article about them now we're in the early 21st century?

Despite the growth of use of array-based lists, where you pre-allocate a big block of memory and index into it to place items, linked lists still have their uses. I've talked at length about how basic lock-free data structures, like the stack and the queue, are made "simpler" by using an underlying singly-linked list, and also in certain situations, using a linked list can be faster or simpler than using an array-based list.

Let's start by thinking about what our singly-linked list is going to do. I warn you I'm going to keep it really simple. It's going to be a class that stores a sequence of items. This class is enumerable (that is, we can ask an instance of the class for an enumerator and use that enumerator to visit each of the items). There are two places where we can insert or delete a item: the front of the list and the back of the list.

Using an instance of this basic class as a delegate, we can easily implement a stack, a queue, and a deque (that is, a "double-ended queue", a queue where you can enqueue or dequeue at either end), and all operations on these classes bar one will be O(1) — that is, will run in constant time — no matter how many items are in the linked list. The one exception, as we'll see, is the removal of the item at the end of the list which will be O(n); that is, it will take time proportional to the number of items in the list. A worthwhile class to have in our C# toolbox, in other words.

Now we've nailed down what we want, we can start writing the code. First up, we need a node class. Pretty simple stuff. My first bit of code looked like this:

  internal class Node<T> {
    private T item;
    private Node next;

    public Node(T item) {
      this.item = item;
    }
    public Node Next {}
    public T Item {}
  }

  public class SimpleLinkedList<T> : IEnumerable<T> {

  }

But then I thought, why not make it an nested class instead. That way, especially if I made it private, I wouldn't be polluting the namespace with identifiers that no one could use. So I altered the code like this:

  public class SimpleLinkedList<T> : IEnumerable<T> {

    private class Node {
      public T Item;
      public Node Next;

      public Node(T item) {
        Item = item;
      }
    }

  }

Notice something though. In doing so, the Node class seems to "lose" its type parameter, and yet it is still using that placeholder type. What's going on?

Think of it like this. The declaration of SimpleLinkedList<T> is defining a new placeholder type for its scope; that is, within its open and close braces there is another type T, together with all the other normal types like int, string, double, DateTime, IDisposable, ArrayList and so on. So the Node class doesn't have to be a generic type at all, it can just be a normal class and can use this fab new type T whenever it wants to. Mind you, T is not exactly forthcoming about its abilities: there's not a lot you can do with it, apart from store an instance of it and so on.

So it would be an error for Node be generic and to declare T as its type parameter: T is already defined within its scope. Although I could have declared the class to be Node<U>, say, using a different type parameter, it doesn't make much sense and I don't gain anything from it. I certainly would have had to write some downright confusing code.

Notice something else, as well. Since Node is entirely private and solely within the SimpleLinkedList's purview, I don't need to declare its fields as properties. I'm totally in control here — no one else will be able to use this Node class — and so I've no need to "protect" the fields with property declarations.

After this little diversion on generics, here's the linked list trick I alluded to before. In a "normal" singly linked list where we want to maintain a link to the head and the tail of the list, we have a Head node and a Tail node, and we have to maintain them both as we manipulate the list (see here where I use a singly linked list to create a queue, before I make the queue lock- free). We're doubling our work. The trick is to just maintain a link to the tail of the list and to make the list circular. In other words, instead of the tail node's Next pointer being null to represent the end of the list, we make it point to the head of the list instead.

With this little trick, we've made things a lot simpler. You want the tail node of the list? The Tail field is it. You want the head node of the list? Tail.Next is it. You want to know if the list is empty? The answer: Tail is null. Has the list only one node? Well, if Tail == Tail.Next, it has. Want to clear the list? Set Tail to null.

Let's see how various operations are implemented. Adding an item to the head of the list first:

    public void AddAtStart(T item) {
      Node node = new Node(item);
      if (Tail == null) {
        Tail = node;
        node.Next = node;
      }
      else {
        node.Next = Tail.Next;
        Tail.Next = node;
      }
      count++;
    }

If the list is empty, we set the new node's Next reference to point to itself and set Tail to be the new node. If the list is non empty, we add the node between Tail and its Next reference. Pretty easy.

Adding a node to the end of a list? Very, very similar:

    public void AddAtEnd(T item) {
      Node node = new Node(item);
      if (Tail == null) {
        Tail = node;
        node.Next = node;
      }
      else {
        node.Next = Tail.Next;
        Tail.Next = node;
        Tail = node;
      }
      count++;
    }

So similar, in fact, we have an easy refactoring to make to remove the duplication.

    private Node AddItem(T item) {
      Node node = new Node(item);
      if (Tail == null) {
        Tail = node;
        node.Next = node;
      }
      else {
        node.Next = Tail.Next;
        Tail.Next = node;
      }
      count++;
      return node;
    }

    public void AddAtStart(T item) {
      AddItem(item);
    }

    public void AddAtEnd(T item) {
      Tail = AddItem(item);
    }

Deleting a node from the front of the list is very simple too:

    public void DeleteFromStart() {
      if ((Tail == null) || (Tail.Next == Tail)) {
        Clear();
        return;
      }

      Tail.Next = Tail.Next.Next;
      count--;
    }

So, if the list is empty or only has one item, we clear the list, otherwise we unlink the head node from the list.

Deleting a node from the end of the list is the complex one since we need to find the parent of the tail node. This is the operation that takes time proportional to the length of the list, since we have to follow the links round the list storing parent nodes until we reach the tail node again:

    public void DeleteFromEnd() {
      if ((Tail == null) || (Tail.Next == Tail)) {
        Clear();
        return;
      }

      Node parent = Tail.Next;
      Node walker = parent.Next;
      while (walker != Tail) {
        parent = walker;
        walker = parent.Next;
      }
      parent.Next = Tail.Next;
      Tail = parent;
      count--;
    }

Again we have the quick test for an empty list or a list with only one node. We can refactor the duplication as well:

    private bool HaveClearedList() {
      if ((Tail == null) || (Tail.Next == Tail)) {
        Clear();
        return true;
      }
      return false;
    }

    public void DeleteFromStart() {
      if (!HaveClearedList()) {
        Tail.Next = Tail.Next.Next;
        count--;
      }
    }

    public void DeleteFromEnd() {
      if (!HaveClearedList()) {
        Node parent = Tail.Next;
        Node walker = parent.Next;
        while (walker != Tail) {
          parent = walker;
          walker = parent.Next;
        }
        parent.Next = Tail.Next;
        Tail = parent;
        count--;
      }
    }

Since SimpleLinkedList<T> implements the IEnumerable<T> interface we need to provide those interface methods too. I implemented them explicitly:

    public IEnumerator<T> GetEnumerator() {
      if (Tail != null) {
        Node current = Tail.Next;
        yield return current.Item;
        while (current != Tail) {
          current = current.Next;
          yield return current.Item;
        }
      }
    }

    IEnumerator<T> IEnumerable<T>.GetEnumerator() {
      return this.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
    }

I have to say that iterators (which is what GetEnumerator() is) make writing code to enumerate through a collection-like object so much easier.

For completeness here are the other methods and properties.

    public void Clear() {
      Tail = null;
      count = 0;
    }

    public T First {
      get {
        if (Tail == null)
          return default(T);
        return Tail.Next.Item;
      }
    }

    public T Last {
      get {
        if (Tail == null)
          return default(T);
        return Tail.Item;
      }
    }

    public int Count {
      get { return count; }
    }

I just love the expressivity of the Clear() method: I just set the Tail reference to null and trust the garbage collector to actually reclaim the memory used.

So there you have it: a nifty little trick that helps you more simply implement a singly linked list that can be used for stacks, queues, and deques, and a quick discourse on writing better generics. A tasty little amuse-bouche indeed.