Thread Pool (part 1)

published: Thu, 1-Dec-2005   |   updated: Tue, 7-Mar-2006

So a reader commented about my WaitableThread class. "Couldn’t you use a custom thread pool for this?" and encouraged me to take a look at a "smart" thread pool on Code Project. As it happened, the one he was recommending was one I'd taken a look at previously. Yes, you see the developer who wanted to launch lots of threads (see my previous post) had discovered this implementation and was all for using it. I took a look at it and decided that I could see enough problems on a cursory examination that I didn't want to use it for real code. Hence, the effort to write a WaitableThread class.

(Honestly I don't want to list the issues I found with this thread pool, but a lot of them had to do with the class model design and how well it was tested -- wishy-washy issues, if you like. Nevertheless, I would advise you not to use it with .NET 2.0, especially not on a 32-bit Pentium, since it does have at least one bug that may show up on a 64-bit processor.)

So my correspondent replied, "I hadn’t looked at the code, but I do like the idea of using my own reusable pool of threads (or other objects). Done right, it seems a good way to avoid overloading the framework’s thread pool, control the number of threads, and prevent garbage collector abuse. I haven’t taken this route on anything, but I’ve considered it a few times."

Reading between the lines, I took this as a gauntlet contemptuously thrown down: "Well, then, Mr SmartyPants, write a better one!" (Well, OK, my reader didn’t really mean this, but it sounds good. A defiant act to stir the emotions, eh?)

So, let's see what we can do. My proposal is this: I'll design and write a thread pool class for .NET in C#, using the best TDD I know. A little like Ron Jeffries does on his site. This way, you get a decent thread pool you can use, an understanding of what it does (and what it doesn't do), and also an idea of what it's like to write something non-trivial TDD style. I'll be taking many days to do this, by the way; obviously I still have a real job to do, so don't take the elapsed calendar time as an indication of how much work TDD is.

If you like, email me as I go along to proffer other requirements, to give me encouragement, or to lay into me about my obvious dearth of programming and design skills. I'll try and incorporate suggestions and advice into successive posts.

For this first installment on the journey, we need to lay down some requirements. Yes, even when developing software TDD-style we need a plan, a notional view of what we want to accomplish.

So, without further ado, here's my initial list of requirements for a thread pool.

  • The user should be able to queue tasks for the thread pool to execute
  • The thread pool, at its discretion, should dequeue a task and execute it
  • The tasks should have a priority (one of a limited number)
    • Higher-priority tasks are dequeued before lower-priority tasks
    • Higher-priority tasks do not pre-empt lower-priority tasks that are already executing
  • A task should be "waitable", that is, the user should be able to block/wait on the task until it completes (in .NET terms, the task should be a WaitHandle). This means the user code can queue up a bunch of tasks and then wait for the first to finish (or all of them to finish).
  • The thread pool should execute each task in a separate thread
  • The thread pool should be thread-safe (that is, callable from many user threads in a thread-safe manner)
  • The thread pool should solely be responsible for managing its threads, the user should have no contact with them
  • The user should be able to abort a task
    • If the task has not yet started, it is merely dequeued and thrown away
    • If the task is executing, the thread pool will make a best effort to abort the thread
      • First it will signal a flag to ask the task to abort as quickly and as safely as possible
      • Second, if a given timeout expires, the thread (and hence the task) is simply aborted
    • If the task has already completed or cannot be found (that is, the thread pool has never seen it), nothing happens
  • The thread pool, at its discretion, can add one or more threads so that it can execute more tasks in parallel
  • Once a thread has completed a task, it should wait until another task is available
    • There should be a user-defined millisecond timeout on this behavior
      • 0 means "don’t wait, just terminate"
      • If the timeout occurs, the thread should terminate
    • This behavior allows the number of threads to decrease
  • The user can specify the maximum number of threads the thread pool can use
  • The thread pool should maintain some kind of statistics on its performance
    • Details to be decided, but may include number of tasks waiting, executing, completed, some timing data, etc
    • The statistics should be queryable as the thread pool is running
  • The thread pool should be able to log exception, warning, informational, etc events
    • Informational events could include
      • A task is enqueued, dequeued, completed
      • A new thread is started
      • An existing thread terminates
    • Warning events could include
      • A task is aborted
    • Exception events should include
      • Any exception thrown inside the thread pool and its related classes
  • The thread pool should not be a singleton (that is, it should be a normal instance class)
  • On completion (whether the task completed normally, or was aborted), the user should be able to query some statistics about the task (e.g., time spent queued, time spent executing, completion status).

Wow, I think that'll do for a start.

Already I can see several possible classes making themselves known from the requirements as written. The thread pool class for a kick-off, a task class, maybe a thread class,a priority queue (mmm, how did that get in? funny that), some kind of log. However, at this stage, that's as far as I want to go: maybe I'm wrong about some of them. We'll start exploring the domain model with TDD next time.