Testing the lock-free stack

Stack of books with laptop on topRecently a reader was having some issues with my lock-free queue implementation. In investigating the problem, I decided to throw out my previous “stack and queue” multithreaded test program as being way too hard to modify and started out afresh. In doing so, I also cleaned up the implementations of the lock-free stack and queue, making extensive use of CodeRush’s code issues functionality.

For details on the lock-free stack, read here. For details on the lock-free queue, see here.

In this post. I’ll talk about testing the lock-free stack in a multithreaded application.

First of all, here’s the latest version of the lock-free stack code. There’s not that much change from the original:

using System;
using JmBucknall.Threading;

namespace JmBucknall.Structures {

  public class LockFreeStack<T> {
    SingleLinkNode<T> head;

    public LockFreeStack() {
      head = new SingleLinkNode<T>();

    public void Push(T item) {
      SingleLinkNode<T> newNode = new SingleLinkNode<T> { Item = item };
      do {
        newNode.Next = head.Next;
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, newNode.Next, newNode));

    public bool Pop(out T item) {
      SingleLinkNode<T> node;
      do {
        node = head.Next;
        if (node == null) {
          item = default(T);
          return false;
      } while (!SyncMethods.CAS<SingleLinkNode<T>>(ref head.Next, node, node.Next));
      item = node.Item;
      return true;

    public T Pop() {
      T result;
      Pop(out result);
      return result;

First of all, I tested the stack to make sure it did the standard push and pop properly in a single-threaded environment. All that’s needed is a few unit tests and, to be frank, these tests are boring and tedious and I won’t show them here. Of more interest is the use of the stack in a multithreaded application.

Back in the old days, I tested on a dual core machine, but now I have a quad-core desktop, so I wanted to test the code using all four cores. In essence I wanted to have one or more “pusher” threads and one or more “popper” threads. With four cores, to avoid as much CPU contention as possible, I could have one pusher and three poppers, two pushers and poppers each, or three pushers and one popper. (Of course I could increase those numbers as far as I wanted to go, but I would run into thread scheduling on each CPU and I rather wanted to stress the concurrency code as much as possible. Having each thread on its own core would certainly do that.)

The flow of the test app would work like this. I created the shared lock-free stack; for ease of testing, making it a stack of ints. To make sure that the pop code didn’t (a) miss an item or (b) pop the same item twice due to any unforeseen multithreaded issues, I decided to check off each item as it was popped. For that I needed an array of results (essentially booleans) to show which values had been popped or not. Since I wanted to use the thread pool in .NET, I needed a way to know in my main thread when each subsidiary thread, be it a pusher or a popper, had finished. The simplest way to do that is to use a manual reset event (ManualResetEvent) for each thread and then signal it (that is, call Set())  at the end of the thread’s method.

For the pushers, I decided to push all the numbers from 1 to some large value (here, 10 million). To make sure that I didn’t push a value twice, I used the Interlocked.Increment() method ensuring that each pusher got its own values.

  static class Globals {
    public const int TopValue = 10000000;
    public const int PusherCount = 1;
    public const int PopperCount = 3;
    public static int PushValue;
    public static bool NoMoreData;

  public class ThreadData {
    public LockFreeStack<int> stack;
    public ManualResetEvent completion;
    public int[] results;

    public ThreadData(LockFreeStack<int> stack, ManualResetEvent completion, int[] results) {
      this.stack = stack;
      this.completion = completion;
      this.results = results;

  static class PusherEngine {
    static public void Execute(Object stateInfo) {
      ThreadData data = stateInfo as ThreadData;
      int pushCount = 0;

      int nextValue = Interlocked.Increment(ref Globals.PushValue);
      while (nextValue <= Globals.TopValue) {
        nextValue = Interlocked.Increment(ref Globals.PushValue);

      Console.WriteLine(String.Format("Push count: {0}", pushCount));

The Globals class is just a holder for some constants and global values. The ThreadData class is merely a simple record-type class to hold the data for the threads. In essence, the only thread-specific member of that class is the completion reset event, so the other members could have been put in the Globals class. Hey, ho.

For the poppers, I decided to let them run, just popping values off the stack until they got a special “close-down” value. After that they just terminated.

  static class PopperEngine {
    static public void Execute(Object stateInfo) {
      ThreadData data = stateInfo as ThreadData;

      int popCount = 0;
      while (true) {
        int value = data.stack.Pop();
        if (value == 0) {
          if (Globals.NoMoreData)
          Thread.Sleep(1); //minor wait when stack is empty
        else {
          int oldValue = Interlocked.CompareExchange(ref data.results[value], 1, 0);
          if (oldValue != 0) {
            Console.WriteLine(String.Format("Error: already popped {0}", value));

      Console.WriteLine(String.Format("Pop count: {0}", popCount));

Notice here the checking to see whether a value has already been popped or not. A value of 0 popped off the stack means that the stack is temporarily empty, and so the popper just waits a smidgeon before going round and trying again. If the stack was empty and the Globvals.NoMoreData item was set, it means that all the pushers have finished adding items to the stack and so it’s time for the popper to terminate.

Now the actual console application:

  class Program {
    private static ManualResetEvent[] QueueWorkItems(int count, LockFreeStack<int> stack, int[] results, WaitCallback callback) {
      ManualResetEvent[] events = new ManualResetEvent[count];
      for (int i = 0; i < count; i++) {
        events[i] = new ManualResetEvent(false);
        ThreadPool.QueueUserWorkItem(callback, new ThreadData(stack, events[i], results));
      return events;

    static void Main() {
      DateTime start = DateTime.Now;

      Console.WriteLine("create the shared stack");
      LockFreeStack<int> stack = new LockFreeStack<int>();

      Console.WriteLine("create the shared results array");
      int[] results = new int[Globals.TopValue + 1];

      Console.WriteLine("create the completion events");
      var pushersDone = QueueWorkItems(Globals.PusherCount, stack, results, PusherEngine.Execute);
      var poppersDone = QueueWorkItems(Globals.PopperCount, stack, results, PopperEngine.Execute);

      Console.WriteLine("wait for the pushers to be done");

      Console.WriteLine("signal the poppers to stop, and wait");
      Globals.NoMoreData = true;

      Console.WriteLine("check that all values were popped");
      for (int i = 1; i <= Globals.TopValue; i++) {
        if (results[i] != 1)
          Console.WriteLine(String.Format("Error: {0} was never popped.", i));

      DateTime end = DateTime.Now;
      Console.WriteLine(end - start);


The QueueWorkItems method creates the manual reset event for each thread and queues the thread up in the thread pool. It returns an array of the manual reset events created so that the main thread can wait on them all to be signaled with WaitAll(). Firstly the main thread waits for all the pushers to finish, sets the “no more data” boolean, and then waits for all the popper threads to complete. Finally it makes sure that all the integer values were popped.

Using this application over several runs with differing numbers of pushers and poppers, I checked that the stack functioned properly. And it did. All four cores were pegged at the max for the few seconds of the test run.

Next time, the queue.




Album cover for Far Away Trains Passing ByNow playing:
Schnauss, Ulrich - nobody's home
(from Far Away Trains Passing By)

Loading similar posts...   Loading links to posts on similar topics...

10 Responses

#1 Thorsten Engler said...
27-Mar-10 10:14 PM

Your LockFreeStack looks susceptible to the ABA problem to me. en.wikipedia.org/.../ABA_problem

I guess it's possible that because you are always allocating new SingleLinkNode's for your push's combined with the GC significantly reduces the chance of the problem occurring, but I don't think it can guarantee that the problem will never occur.

#2 Dmitriy V'jukov said...
27-Mar-10 11:38 PM

Can your testing technique reveal the error in the following stack implementation? www.rsdn.ru/.../3719883.1.aspx (it's a stack that avoids dynamic memory allocation, built over an array)

#3 gabr said...
28-Mar-10 3:39 AM

I found out that the most important thing when testing multithreaded (and especially lock-free) code is to run more threads than there are CPUs. That causes threads to enter wait state more often and because of that ABA problems exhibit much more frequently.

julian m bucknall avatar
#4 julian m bucknall said...
28-Mar-10 10:25 AM

All: There is no ABA problem with this lock-free stack code in .NET. The reason is entirely due to the garbage collector and the memory model in .NET: no nodes can be collected (and hence reused) until there are no more references to them across all threads. It's not possible for a node to be reused while some thread has a reference to it. I wrote at length about this in a rebuttal to someone who did think there was an ABA problem:


So, if I were to write a "lock-free node manager" and pushed dead nodes onto it in the Pop and allocated them from it in the Push, then, yes, I would have an ABA problem. In spades. I know because this is what I did four years ago and I learned my lesson at that time. Using this code in any non-garbage-collected environment, like Delphi, could also produce the ABA problem. I know because this is what I did four years ago and I learned my lesson.

Cheers, Julian

PS: Check out Joe Duffy's version of this lock-free stack in Concurrent Programming on Windows, pp. 534-537. He also talks about why the ABA problem doesn't occur in this situation.

julian m bucknall avatar
#5 julian m bucknall said...
28-Mar-10 11:25 AM

Dmitriy: Youch. My first thought is: any time you expose the internals of the implementation that manages a data structure, you're asking for trouble. So here the array is public, the indexes into the array are public, the elements of the array are publicly known. I'm not even sure the code would work in a single-threaded test app (in fact, you can never pop the final item in the stack as far as I can see). There's no connection between the length of the array and the pushCount parameter.

I'd be rewriting it so that you push and pop T values and the rest is hidden. Obviously the Push() method would have to return a value to indicate the stack is full.

Cheers, Julian

#6 Testing the lock-free queue said...
28-Mar-10 12:33 PM

Having discussed a test app for testing the multithreaded use of the lock-free stack , it’s now time for the lock-free queue. First of all, I decided to play around with the lock-free queue more than I had done with the lock-free stack. The queue works

#7 Dmitriy V'jukov said...
28-Mar-10 10:44 PM

@Julian Well, the code looks not very good. AFAIU, the use case is to allocate a big array, and then pass pieces of it to several stacks. AFAIU, the main author's concern was to avoid dynamic memory allocation and get pressure off of the GC.

However, actually I do not care. I write in C and structure my code quite differently. I am about lock-free.

So can you spot the error in the algo? If not, how can you be sure that there is no such error in your algos?

julian m bucknall avatar
#8 julian m bucknall said...
29-Mar-10 9:59 AM

Dmitriy: You mean the error in that implementation apart from all the other errors I've already mentioned? I'd have thought that "you can't pop the only item in the stack" to be a pretty big error straight off.

Also, there's no incentive to try and solve a "there's one or more bugs in this code" problem without getting any information about what errors are seen in a test app. Maybe the bugs are in the test app.

Heck, I'm not even sure how it's supposed to be used, so a test app would be more than welcome. In using a multithreaded stack, I'm more used to having 'producers' that push 'work' items on the stack, and 'consumers' that pop them off. Here I'm not even sure how a consumer would signal to a producer that it had completed the work on index N and hence that index can now be reused. Yuk, even writing that down gives me the shivers.

Cheers, Julian

#9 Nasty ABA problem in array-based lock-free stack said...
29-Mar-10 9:28 PM

In the comments to my post on testing the lock-free stack , Dmitriy V'jukov wondered if my code could test a broken implementation of a stack built on a preallocated array . He provided a link to the code on a Russian forum (yes, I used Google Translate

#10 Dew Drop – March 29, 2010 | Alvin Ashcraft's Morning Dew said...
30-Mar-10 5:20 AM

Pingback from Dew Drop – March 29, 2010 | Alvin Ashcraft's Morning Dew

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response