Parallel Programming: five reasons for caution. Reflections from Intel’s Parallel Studio briefing.

I’m just back from an Intel software conference in Salzburg where the main topic was Parallel Studio, a new suite which adds Intel’s C/C++ compiler, debugging and profiling tools into Visual Studio. To some extent these are updates to existing tools like Thread Checker and VTune, though there are new features such as memory checking in Parallel Inspector (the equivalent to Thread Checker) and a new user interface for Parallel Amplifier (the equivalent to VTune). The third tool in the suite, Parallel Composer, is comprised of the compiler and libraries including Threading Building Blocks and Intel Integrated Performance Primitives.

It is a little confusing. Mostly Parallel Studio replaces the earlier products for Windows developers using Visual Studio; though we were told that there are some advanced features in products like VTune that meant you might want to stick with them, or use both.

Intel’s fundamental point is that there is no point in having multi-core PCs if the applications we run are unable to take advantage of them. Put another way, you can get remarkable performance gains by converting appropriate routines to use multiple threads, ideally as many threads as there are cores.

James Reinders, Intel’s Chief Evangelist for software products, introduced the products and explained their rationale. He is always worth listening to, and did a good job of summarising the free lunch is over argument, and explaining Intel’s solution.

That said, there are a few caveats. Here are five reasons why adding parallelism to your code might not be a good idea:

1. Is it a problem worth solving? Users only care about performance improvements that they notice. If you have a financial analysis application that takes a while to number-crunch its data, then going parallel is a big win. If your application is a classic database forms client, it is probably a waste of time from a performance perspective. You care much more about how well your database server is exploiting multiple threads on the server, because that is likely to be the bottleneck.

There is a another reason to do background processing, and that is in order to keep the user interface responsive. This matters a lot to users. Intel said little about this aspect; Reinders told me it is categorised as convenience parallelism. Nevertheless, it is something you probably should be doing, but requires a different approach than parallelising for performance.

2. Will it actually speed up your app? There is an overhead in multi-threading, as you now have to manage the threads as well as performing your calculations. The worst case, according to Reinders, is a dual-core machine, where you have all the overhead but only one additional core. If the day comes when we routinely have, say, 64 cores on our desktop or laptop, then the benefit becomes overwhelming.

3. Is it actually desirable on a multi-tasking operating system? Consider this: an ideally parallelised application, from a performance perspective, is one that uses 100% CPU across all cores until it completes its task. That’s great if it is the only application you are running, but what if you started four of these guys (same or different applications) simultaneously on a quad-core system? Now each application is contending with others, there’s no longer a performance benefit, and most likely the whole system is going to slow down. There is no perfect solution here: sometimes you want an application to go all-out and grab whatever CPU it needs to get the job done as quickly as possible, while sometimes you would prefer it to run with lower priority because there are other things you care about more, such as a responsive operating system, other applications you want to use, or energy efficiency.

This is where something like Microsoft’s concurrency runtime (which Intel will support) could provide a solution. We want concurrent applications to talk to the operating system and to one another, to optimize overall use of resources. This is more promising than simply maxing out on concurrency in every individual application.

4. Will your code still run correctly? Edward Lee argues in a well-known paper, The Problem with Threads, that multi-threading is too dangerous for widespread use:

Many technologists are pushing for increased use of multithreading in software in order to take advantage of the predicted increases in parallelism in computer architectures. In this paper, I argue that this is not a good idea. Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed.

I put this point to Reinders at the conference. He gave me a rather long answer, saying that it is partly a matter of using the right libraries and tools (Parallel Studio, naturally), and partly a matter of waiting for something better:

Law articulates the dangers of threading. Did we magically fix it or do we really know what we’re doing in inflicting this on the masses? It really come down to determinism. If programmers make their program non-deterministic, getting out of that mess is something most programmers can’t do, and if they can it’s horrendously expensive.

He’s right, if we stayed with Windows threads and Pthreads and programming at that level, we’re headed for disaster. What you need to see is tools and programming templates that avoid that. The evil thing is what we call shared mutable state. When you have things happening in parallel, the safest thing you can do is that they’re totally independent. This is one of the reasons that parallelism on servers works so well, in that you do lots and lots of transactions and they don’t bump into each other, or they only interface through the database.

Once we start opening up shared mutable state, encouraging threading, we set ourselves up for disaster. Parallel Inspector can help you figure out what disasters you create and get rid of them, but ultimately the answer is that you need to encourage people to use programming like OpenMP or Threading Building Blocks. Those generally guide you away from those mistakes. You can still make them.

One of the open questions is can you come up with programming techniques that completely avoid the problem? We do have one that that we’ve just started talking about called Ct … but I think we’re at the point now where OpenMP and Threading Building Blocks have proven that you can write code with that and get good results.

Reinders went on to distinguish between three types of concurrent programming, referring to some diagrams by Microsoft’s David Callaghan. The first is explicit, unsafe parallelism, where the developer has to do it right. The second is explicit, safe parallelism. The best approach according to Reinders would be to use functional languages, but he thinks it unlikely that they will catch on in the mainstream. The third type is implicit parallelism that’s safe, where the developer does not even have to think about it. An example is the math kernel library in IPP (Intel Integrated Performance Primitives) where you just call an API that returns the right answers, and happens to use concurrency for its work.

Intel also has a project called Ct (C/C++ for Throughput) which is a dynamic runtime for data parallelism, which Reinders considers also falls into the implicit parallelism category.

It was a carefully nuanced answer, but proceed with caution.

5. Will your application need a complete rewrite? This is a big maybe. Intel’s claim is that many applications can be updated for parallelism with substantial benefits. A guy from Nero did a presentation though, and said that an attempt to parallelise one of their applications, a media transcoder, had failed because the architecture was not right, and it had to be completely redone. So I guess it depends.

This brings to mind another thing which everyone agrees is a hard challenge: how to design an application for effective parallelism. Intel has a tool in preparation called Parallel Advisor, to be part of Parallel Studio at a future date, which is meant to identify candidates for parallelism, but that will not be a complete answer.

Go parallel, or not?

None of the above refutes Intel’s essential point: that effective concurrent programming is essential to the future of computing. This is an evolutionary process though, and at this point there is every reason to be cautious rather than madly parallelising every piece of code you touch.

Additional Links

Microsoft has a handy Parallel Computing home page.

David Callaghan: Design considerations for Parallel Programming