The Task Parallel Library Series - Parallel.For & Parallel.ForEach

In the previous post we discussed the principles behind data parallelism, in this one we'll show how to implement it in .Net 4.0. This will probably be a short post, since it's really simple. Here's a regular sequential loop:


var data = GetTenThousandElementArray();for (var i = 0; i < 10000; i++)
{
 DoSomeProcessing(data[i]);
}

And here's the parallelised version:


var data = GetTenThousandElementArray();Parallel.For(0, 10000, i =>
{
 DoSomeProcessing(data[i]);
});

Pretty simple stuff, and highly readable. Just switch from the C# for keyword to a call to the static For() method on the System.Threading.Tasks.Parallel type. In it's basic form, this method takes three parameters - the "from" value (inclusive), the "to" value (exclusive) and the loop body itself (shown as a lambda here, but an anonymous delegate or full-blown named delegate will also work just fine)1.

What about data that doesn't have a known range, such as an IEnumerable<T>, where you don't know how many values may be retrieved? In sequential code, you'd handle that with:


var data = SomeSourceOfIEnumerable();   foreach (var val in data)
{
 DoSomeProcessing(val);
}

To switch this to parallel execution, you can probably guess what you need to do:


var data = SomeSourceOfIEnumerable();   Parallel.ForEach(data, (val) =>
{
 DoSomeProcessing(val);
});

Again, the parallel version is pretty much identical to the sequential but the intent is quite clear. What you should notice from both examples is that we aren't dealing with threads. We aren't having to look at how many cores we have available, or coding up some algorithm for partitioning the data. We simply state the intent, and let the runtime take it's best shot at how to execute the code. And it's the runtime that has all the knowledge as to what state the environment is in right now, so arguably it's in the best position for making those decisions.

In the rare scenarios where it doesn't do it "right", or where you know something about your data source that could lead to a more optimal way of processing, there are various hooks available so that you can take on more responsibility; depending on how long this series gets and on whether anyone asks, we may look at some of them in later posts.

You may also be wondering how to handle exceptions, how to break out of the loop and how to cancel the loop. Those are all important things that are frequently required, and they also tend to mess the code up if you're doing it by hand. In future posts we'll examine each of these and show how it is achieved through the TPL.

As we discussed in the previous post, the examples above assume that the DoSomeProcessing method is itself thread-safe, which is ideally achieved by not accessing any shared state. That doesn't mean that you can't share stuff, just that each loop iteration needs to be careful not to tread on anyone else's toes. For example, this is fine:


var data = GetTenThousandElementArray();Parallel.For(0, 10000, index =>
{
 data[index] = data[index]*data[index];
});

Although each loop iteration is accessing the data array in parallel, each access is only dealing with a single distinct element. Since there is no concurrent access to a single memory location, this code is completely safe. Compare that with:


var sum = 0;
Parallel.For(0, 10000, index =>
{
 sum += data[index];
});

In this code, each iteration is accessing the single sum field and invalid results are likely2. Note that aggregate operations like this are a pretty common requirement, and there is a way to achieve this within the TPL which we'll look at soon. For now, I think that's enough. Next post will be on Task Parallelism, see you there.


1The observant amongst you (and ReSharper) will say that you can replace that lambda as follows:


Parallel.For(0, 10000, DoSomeProcessing);

Feel free to do that in your code, but for this series I'll stick with the longer version just to highlight the similarities with the sequential code.

2Only likely? Well, yes. Here is an example of the fun you can have testing concurrent code - if you write this code and run it, chances are it will work just fine. If you've got a single core machine (or a single core assigned to your VM), then the odds are in fact pretty good that it will run correctly. Why? Well, for it to produce invalid results, a thread context switch has to happen at just the right place. And you've got no control over that, it's all down to the Windows scheduler. What you can be sure of is that it will go wrong shortly after shipping; there's a guy called Murphy who can explain that to you! Later on in this series, I'll talk some more about testing and about the CHESS tool from Microsoft Research that can help out. One step at a time though.

Comments

# The Morning Brew - Chris Alcock &amp;raquo; The Morning Brew #634
Gravatar The Morning Brew - Chris Alcock &amp;raquo; The Morning Brew #634
Left by Pingback/TrackBack on 7/2/2010 9:38 AM
# re: The Task Parallel Library Series - Parallel.For & Parallel.ForEach
Gravatar Excellent series - keep up the great work!
Left by JJ on 7/2/2010 4:56 PM
#  Twitter Trackbacks for The Task Parallel Library Series - Parallel.For &amp;amp; Parallel.ForEach [imeta.co.uk] on Topsy.com
Gravatar
Twitter Trackbacks for

The Task Parallel Library Series - Parallel.For &amp;amp; Parallel.ForEach
[imeta.co.uk]
on Topsy.com
Left by Pingback/TrackBack on 7/3/2010 1:08 PM
# re: The Task Parallel Library Series - Parallel.For & Parallel.ForEach
Gravatar Thanks for information!
Left by antony on 7/5/2010 1:58 PM

Leave Your Comment

Title*
Name*
Email (never displayed)
 (will show your gravatar)
Url
Comment*

Please add 8 and 7 and type the answer here:

Preview Your Comment.