Tag Archives: concurrency

New OpenACC compiler directives announced for GPU accelerated programming

A new standard for accelerating C/C++ programming with compiler directives has been announced at the SC11 Supercomputing conference in Seattle. The new standard is called OpenACC  and has been created by NVIDIA, Cray, PGI (Portland Group) and CAPS enterprise.

OpenACC compiler directives are code annotations that enable the compiler to parallelise code while ensuring thread-safety. The big difference between OpenACC and the existing OpenMP standard is that OpenACC primarily targets the GPU rather than CPU, whereas OpenMP is generally CPU only. That said, OpenACC can also target the CPU so it is flexible; the idea is that it will adapt to the target system.


OpenACC is “defined to be interoperable with OpenMP” according to the FAQ and the OpenACC CEO hopes for some future integration, though it seems to have been developed independently which may cause some tension.

OpenACC is expected to ship during the first half of 2012 on compilers from PGI, Cray and CAPS Enterprise. The NVIDIA involvement may make you wonder whether it is GPU-specific; the answer is “maybe”. The FAQ says:

Will OpenACC run on AMD GPUs?

– It could, it requires implementation, there is no reason why it couldn’t

Will OpenACC run on top of OpenCL?

– It could, it requires implementation, there is no reason why it couldn’t

Will AMD/Intel/MS/XX support this?

– As this is just announced we can’t speak to the rate of external adoption or participation.

Will OpenACC run on NVIDIA GPUs with CUDA?

– Yes. Programmers may wish to develop some code using directives, and more sophisticated code using CUDA C, CUDA C++ or CUDA Fortra

Spot the Yes in the above! Still, you can scarcely blame NVIDIA for supporting its own GPU family; and I have been impressed with how the company works with the scientific and academic community to realise the potential of massively parallel computing.

OpenACC is about democratising parallelism, rather than advancing the state of the art. Best optimisation is obtained by more complex programming, but directives make some remarkable performance improvements easy to achieve.

NVIDIA talks up GPU computing, presents roadmap

At the NVIDIA GPU Technology Conference in San Jose CEO Jen-Hsun Huang talked up the company’s progress in GPU computing, showed some example applications, and announced a high-level roadmap for future graphics chip architectures. NVIDIA has three areas of focus, he said: the Quadro line for visualisation, Tesla for parallel computing, and GeForce/Tegra for personal computing. Tegra is a system on a chip aimed at mobile devices. Mobile, says Huang, is “a completely disruptive force to all of computing.”

NVIDIA’s current chip architecture is called Fermi. The company is settling on a two-year product cycle and will deliver Kepler in 2011 with 3 to 4 times the performance (expressed as Gigaflops per watt) of Fermi. Maxwell in 2013 will have around 12 times the performance of Fermi. In between these architecture changes, NVIDIA will do “kicker” updates to refresh its products, with one for Fermi due soon.

The focus of the conference though is not on super-fast graphics cards in themselves, but rather on using the GPU for general purpose computing. GPUs are very, very good at doing mathematics fast and in parallel. If you have an application that does intensive calculations, then executing that part of the code on the GPU can offer impressive performance increases. NVIDIA’s CUDA library for C lets you do exactly that. Another option is OpenCL, a standard that works across GPUs from multiple vendors.

Adobe uses CUDA for the Mercury Playback engine in Creative Suite 5, greatly improving performance in After Effects, Premiere Pro and Photoshop, but with the annoyance that you have to use a compatible NVIDIA graphics card.

The performance gain from GPU programming is so great that it is unavoidable for applications in relevant areas, such as simulation or statistical analysis. Huang gave a compelling example during the keynote, bringing heart surgeon Dr Michael Black on stage to talk about his work. Operating on a beating heart is difficult because it presents a moving target. By combining robotic surgery with software that is able to predict the heart’s movement through simulation, he is researching how to operate on a heart almost as if it were stopped and with just a small incision.

Programming the GPU is compelling, but difficult. NVIDIA is keen to see it become part of mainstream programming, for obvious reasons, and there are new libraries and tools which help with this, like Parallel Nsight for Visual Studio 2010. Another interesting development, announced today, is CUDA for x86, being developed by PGI, which will let your CUDA code run even when an NVIDIA GPU is not present. Even if the performance gains are limited, it will mean developers who need to support diverse systems can run the same code, rather than having a different code path when no CUDA GPU is detected.

That said, GPU programming still has all the challenges of concurrent development, prone to race conditions and synchronization problems.

Stuffing a server full of GPUs is a cost-effective route to super-computing. I took a brief look at the exhibition, which includes this Colfax CXT8000 with 8 Tesla GPUs; it also has three 1200W power supplies. It may cost $25,000 but if you look at the performance you are getting for the price, machines like this are great value.


Anders Hejlsberg on functional programming, programming futures

At TechDays in Belgium Micrososft’s C# designer and Technical Fellow Anders Hejlsberg spoke on trends in programming languages; you can watch the video here.

I recommend it highly, not so much because of any new or surprising content, but because in his low-key way Hejlsberg is a great communicator. The talk is mostly not about the far future, and much of what he covers relates directly to C# 4.0 and F# as found in Visual Studio 2010. Despite his personal investment in C#, Hejlsberg talks cheerfully about the benefits of F# and gives perhaps the best overview of functional programming I have heard, explaining what it is and why it is well suited to concurrency.


I will not try and summarise the whole talk here; but will bring out its unifying thought, which is that programming is moving towards a style that emphasises the “what” rather than the “how” of the tasks it encodes. This fits with a number of other ideas: greater abstraction, more declarative, more use of DSLs (domain-specific languages).

The example he gives early on describes how to get a count of groups in a set of data. You can do this using a somewhat manual approach, iterating through the data, identifying the groups, storing them somewhere, and incrementing the count as items belonging to each group are discovered.

Alternatively you can code it in one shot using the count keyword in LINQ or SQL (though Hejlsberg talked about LINQ). This is an example of using a DSL (Domain Specific Language), and also demonstrates a “what” rather than “how” approach to code. It is easier for another programmer to see your intention, as there is no need to analyse a set of loops and variables to discover what they do.

There is another reason to prefer this approach. Since the implementation is not specified, the compiler can more easily optimise your code; you do not care provided the result is correct. This becomes hugely important when it comes to concurrency, where we want the compiler or runtime to utilise many CPU cores if they are available. He has a screenshot of Task Manager running on a 128-core machine which apparently exists in Redmond (I can’t quite read the figure for total RAM but think it may be 128GB):


Hejlsberg says there was a language doldrums between 1995 and 2005, when many assumed that Java was the be-all and end-all. I wonder if this is a tacit admission that C#, which he was working on during that period, is not that different in philosophy from Java? The doldrums are over and we now have an explosion of new and revived languages: Ruby, Groovy, Python, Clojure, Boo, Erlang, F#, PowerShell and more. However, Hejlsberg says it makes sense for these to run on an existing framework – in practice either the Java or .NET runtime – since the benefits are so great.

Hejsberg also predicts that distinctions such as dynamic versus static languages will disappear as each language absorbs the best features from other languages. “Traditional taxonomies of languages are breaking down as languages pick paradigms from each other,” he says. The new language paradigm is multi-paradigm.

Just as C# has now acquired dynamic features, we can expect it to get better support for immutability in future (borrowed from functional languages).