July 2010 Entries
The Task Parallel Library Series - Task Parallelism

So far, we've looked the the motivations behind concurrency and Data Parallelism as a technique to introduce concurrency into your code. We've also looked at some simple examples of Parallel.For() and Parallel.ForEach(). In this post we're going to explore the other main way that concurrency can be harnessed, Task Parallelism.

Data parallelism was all about executing the same block of code on many different data items. Task parallelism is about concurrently executing different blocks of code, perhaps operating on the same data, perhaps on different. What's important is that the tasks are, as far as possible, independent. If there are dependencies between them things will rapidly get more complex, and all those issues we discussed in data parallelism such as corruption and deadlocks will raise their heads.

Let's look at an example. Firstly, in the familiar sequential world:


public void OnlineStore()
{
	TakeOrder();
	SendOrderReceivedNotification();
	PrepareInvoice();
	SendInvoice();
	PickGoodsFromShelf();
	UpdateStockControl();
	PackGoods();
	ShipGoods();
	SendDispatchedNotification();
}

In here, we have a whole bunch of different methods (tasks, code blocks, call them what you want). Some are dependent on others, some aren't. Let's see how it could be parallelised (pseudocode only):


public void Amazon()
{
	TakeOrder();
	
	in parallel do
		=> SendOrderReceivedNotification();
		=> DealWithInvoicing();
		=> DealWithTheGoods();
}

public void DealWithInvoicing()
{
	PrepareInvoice();
	SendInvoice();
}

public void DealWithTheGoods()
{
	PickGoodsFromShelf();

	in parallel do
		=> UpdateStockControl();
		=> SendGoods();
}

public void SendGoods()
{ 
	PackGoods();
	ShipGoods();
	SendDispatchedNotification();
}

A bit more code, but still readable (I'd actually argue a bit more readable since the first method was, IMO, doing way too much stuff). Basically all we've done is divide the tasks into those that are dependent and those that aren't. For the independent tasks, we're using a new "in parallel do" construct to indicate to the language / runtime that these can be run on different threads. In the new code, the longest chain from start to completion is 5 steps compared to the 9 steps in the sequential code, so we should expect some speedup when run on a multi-core system.

How much speedup? Since we've dropped from 9 steps to 5, does that indicate that we should now be almost twice as fast? Clearly not, since each step is unlikely to have the same sequential execution time. Let me introduce Gene Amdahl who formulated the fundamental limits to what parallelisation could achieve. In essence, he stated that the maximum speedup that can be achieved through concurrency is limited by the sequential part of the program. That is, if you have a program that takes 10 hours to run and within that there is a task of 2 hours that cannot be parallelised, then the fastest that program can run, regardless of the number of processors that can be thrown at it, is 2 hours. Not exactly rocket science, but it earned the title of Amdahl's Law; go take a look if you want to see the maths (it's not hard). Data Parallelism enjoys the fact that the sequential parts can be very small, and so the possible speedups can be truly enormous. Task parallelism is more restricted, in that it's limited to how fast the longest (in time, not code) dependent chain of tasks can run.

If anyone's reading all this, are you spotting a theme yet? Concurrency is important (and will get more so), but dependencies are the problem. Dependencies introduce the bottleneck beyond which you're relying on the hardware to improve it's sequential execution speed to see improvements. And we've already seen that that's unlikely. Dependencies are also the thing that cause you to start dropping locks and other synchronisation mechanisms in your code, obscuring your "business rules" and introducing the risk of hard-to-test, hard-to-diagnose bugs. Loosely coupled, independent code is something that you should strive for, and immutable data structures assist greatly in this. I'm thinking that perhaps the "I" in SOLID could be re-purposed...

That should give an idea as to what Task Parallelism is all about. In the next post, we'll look at how to do this in .Net 4.0.

The Task Parallel Library Series - Parallel.For & Parallel.ForEach

In the previous post we discussed the principles behind data parallelism, in this one we'll show how to implement it in .Net 4.0. This will probably be a short post, since it's really simple. Here's a regular sequential loop:


var data = GetTenThousandElementArray();for (var i = 0; i < 10000; i++)
{
 DoSomeProcessing(data[i]);
}

And here's the parallelised version:


var data = GetTenThousandElementArray();Parallel.For(0, 10000, i =>
{
 DoSomeProcessing(data[i]);
});

Pretty simple stuff, and highly readable. Just switch from the C# for keyword to a call to the static For() method on the System.Threading.Tasks.Parallel type. In it's basic form, this method takes three parameters - the "from" value (inclusive), the "to" value (exclusive) and the loop body itself (shown as a lambda here, but an anonymous delegate or full-blown named delegate will also work just fine)1.

What about data that doesn't have a known range, such as an IEnumerable<T>, where you don't know how many values may be retrieved? In sequential code, you'd handle that with:


var data = SomeSourceOfIEnumerable();   foreach (var val in data)
{
 DoSomeProcessing(val);
}

To switch this to parallel execution, you can probably guess what you need to do:


var data = SomeSourceOfIEnumerable();   Parallel.ForEach(data, (val) =>
{
 DoSomeProcessing(val);
});

Again, the parallel version is pretty much identical to the sequential but the intent is quite clear. What you should notice from both examples is that we aren't dealing with threads. We aren't having to look at how many cores we have available, or coding up some algorithm for partitioning the data. We simply state the intent, and let the runtime take it's best shot at how to execute the code. And it's the runtime that has all the knowledge as to what state the environment is in right now, so arguably it's in the best position for making those decisions.

In the rare scenarios where it doesn't do it "right", or where you know something about your data source that could lead to a more optimal way of processing, there are various hooks available so that you can take on more responsibility; depending on how long this series gets and on whether anyone asks, we may look at some of them in later posts.

You may also be wondering how to handle exceptions, how to break out of the loop and how to cancel the loop. Those are all important things that are frequently required, and they also tend to mess the code up if you're doing it by hand. In future posts we'll examine each of these and show how it is achieved through the TPL.

As we discussed in the previous post, the examples above assume that the DoSomeProcessing method is itself thread-safe, which is ideally achieved by not accessing any shared state. That doesn't mean that you can't share stuff, just that each loop iteration needs to be careful not to tread on anyone else's toes. For example, this is fine:


var data = GetTenThousandElementArray();Parallel.For(0, 10000, index =>
{
 data[index] = data[index]*data[index];
});

Although each loop iteration is accessing the data array in parallel, each access is only dealing with a single distinct element. Since there is no concurrent access to a single memory location, this code is completely safe. Compare that with:


var sum = 0;
Parallel.For(0, 10000, index =>
{
 sum += data[index];
});

In this code, each iteration is accessing the single sum field and invalid results are likely2. Note that aggregate operations like this are a pretty common requirement, and there is a way to achieve this within the TPL which we'll look at soon. For now, I think that's enough. Next post will be on Task Parallelism, see you there.


1The observant amongst you (and ReSharper) will say that you can replace that lambda as follows:


Parallel.For(0, 10000, DoSomeProcessing);

Feel free to do that in your code, but for this series I'll stick with the longer version just to highlight the similarities with the sequential code.

2Only likely? Well, yes. Here is an example of the fun you can have testing concurrent code - if you write this code and run it, chances are it will work just fine. If you've got a single core machine (or a single core assigned to your VM), then the odds are in fact pretty good that it will run correctly. Why? Well, for it to produce invalid results, a thread context switch has to happen at just the right place. And you've got no control over that, it's all down to the Windows scheduler. What you can be sure of is that it will go wrong shortly after shipping; there's a guy called Murphy who can explain that to you! Later on in this series, I'll talk some more about testing and about the CHESS tool from Microsoft Research that can help out. One step at a time though.