The Task Parallel Library Series - The Task Class

Now that we understand the principles of Task Parallelism, let's see how to do it in .Net 4. In previous versions of .Net, there were two main options. One was to launch a new thread for each task:


var thread = new Thread(() =>
	{
		DoSomeWorkHere();
	});

thread.Start();

DoOtherWork();
DoOtherWork();
DoOtherWork();

thread.Join();

Console.WriteLine("All work completed");	

The other was to make use of the built-in thread pool1:


ThreadPool.QueueUserWorkItem(_ =>
	{
		DoSomeWorkHere();
	});

DoOtherWork();
DoOtherWork();
DoOtherWork();

Console.WriteLine("My work completed.");
Console.WriteLine("No idea what the threadpool task is doing.");

Which of these was best? Well, it largely depends on what you are doing. If you have some long-running task2 to execute then launching a new thread is probably a reasonable way forward; although threads are relatively expensive objects, the cost is amortised over the life of the task which, for big tasks, makes the overhead pretty much irrelevant. It also leaves the thread pool (which has a limited number of threads) available for doing what it does best, which is the execution of relatively small tasks.

In general, the thread pool tended to be the option that most people went for; long running tasks are not that common (remember that the longest sequential chain limits how fast your code can ever run, so long running tasks are best avoided), and the thread pool offered a good combination of ease of use and performance.

However, to do anything more than a "fire & forget" approach with the thread pool is problematic. Once you call QueueUserWorkItem, you essential loose contact with the work that you've just submitted. To keep in touch, you need to hand roll the code yourself (usually involving some combination one of more ManualResetEvent or Monitor objects). Why would you need to keep in touch? Well, if the tasks is performing some form of calculation or gathering data from some external service, you'd probably quite like to access the result. Sure, it can stick that in some member field of a class that is shared with the calling code, but you still need to know when the processing is complete before you attempt to access the field. (Hmm, shared stuff again. Wasn't that bad or something?).

Plus, what if your processing throws an exception? You need to handle that, otherwise it's game over for the whole process3. So you need to have a mechanism for communicating these events back to the "main" thread. And how about cancellation? What if a user initiates some task that could take some time, and then changes her mind? Ideally, you'd like to cancel that task to save on the unnecessary processing. Again, here come those volatile flags, events etc. to let you perform such synchronisation.

This is all pretty unsatisfactory. There's an awful lot of plumbing that needs to be written when using either the thread pool or regular threads, and this code obscures the logic that you're trying to express (you know, the stuff the customer is paying for. They typically don't give a damn about all this threading stuff). Plus it tends to end up relatively complex and hard to reason with, which makes the creation and/or introduction of bugs more likely.

So, this post is all about .Net 4 and yet all I've done so far is moan about the previous state of affairs. Let's see some new code:


var task = Task.Factory.StartNew(() =>
	{
		return DoSomeWorkHere();
	});

DoOtherWork();
DoOtherWork();
DoOtherWork();

task.Wait();

Console.WriteLine("All work completed. Task returned {0}", task.Result);

Simple stuff to start with (future posts will expand much further), but this code starts a Task and then proceeds to do some other work. At a later point, it waits on the task and gets back the result. You'll note the total absence of anything to do with threading - no locks, no events, nothing. You'll also note that there's no shared state - the Task does it's work and returns the results - no messing about with shared fields and the associated synchronisation.

So what are the Task and the Factory types? You can think of Task as a representation of some value (the result) which will be available at some point in the future, and it provides suitable members for waiting on and accessing the result once execution completes. The Factory provides the abstraction over the mechanism by which the task is scheduled & executed - is it in the thread pool, does it run on some specific thread (maybe the UI thread, for example) or is it some async call that doesn't actually have a thread permanently associated with it? The default factory (as used above) simply delegates to the thread pool (but with all the plumbing that's needed to make it useful) but that can be overridden, which we'll look at in a future post

So, that hopefully gives you an easy introduction to the Task class. Future posts will drill into more depth on usage scenarios, covering many of the things that were problematic prior to .Net 4.0.


1BTW, what's with that odd "_" in the lambda for QueueUserWorkItem? Well, the delegate that it expecting is a WaitCallback, which is defined as public delegate void WaitCallback(object state);. From that, you can see that any method being used as the start of a thread has a state argument. In our lambda, we don't make use of this, and the convention is to use "_" as the name for an unused argument in a lambda. "_" is a perfectly valid C# identifier, so there's no magic here. Of course, if you have more than one argument that is unused, you can't use "_" for both of them!

2By long running task, I mean some continuous "thread" of execution within a program. Clearly there are many examples of long running business tasks that can lasts many hours (or days or weeks), but this behaviour is typically implemented through some state that is persisted and "resumed" at the appropriate points. Trying to model a business process by running a single thread that starts when the business process starts and just keeps running until the business process has finished is unlikely to be a great solution :)

3By default (and quite correctly, IMO), the .Net runtime will abort any process that has an unhandled exception on any thread. This was a change made in, if I recall, .Net 2.0 - prior to that, it was only unhandled exceptions on the main thread that caused the process to abort. On other threads the thread would just silently die, leaving the application running but now in some (by definition) undefined state. I don't know about you, but hoping that code is going to execute properly when the application state is undefined doesn't sound like the best idea in the world.

Comments

No comments posted yet.

Leave Your Comment

Title*
Name*
Email (never displayed)
 (will show your gravatar)
Url
Comment*

Please add 5 and 3 and type the answer here:

Preview Your Comment.