Programming NVIDIA GPUs and Intel MIC with directives: OpenACC vs OpenMP

Last month I was at Intel’s software conference learning about Many Integrated Core (MIC), the company’s forthcoming accelerator card for HPC (High Performance Computing). This month I am in San Jose for NVIDIA’s GPU Technology Conference learning about the latest development in NVIDIA’s platform for accelerated massively parallel computing using GPU cards and the CUDA architecture. The approaches taken by NVIDIA and Intel have much in common – focus on power efficiency, many cores, accelerator boards with independent memory space controlled by the CPU – but also major differences. Intel’s boards have familiar x86 processors, whereas NVIDIA’s have GPUs which require developer to learn CUDA C or an equivalent such as OpenCL.

In order to simplify this, NVIDIA and partners Cray, CAPS and PGI announced OpenACC last year, a set of directives which when added to C/C++ code instruct the compiler to run code parallelised on the GPU, or potentially on other accelerators such as Intel MIC. The OpenACC folk have stated from the outset their hope and intention that OpenACC will converge with OpenMP, an existing standard for directives enabling shared memory parallelisation. OpenMP is not suitable for accelerators since these have their own memory space.

One thing that puzzled me though: Intel clearly stated at last month’s event that it would support OpenMP (not OpenACC) on MIC, due to go into production at the end of this year or early next. How can this be?

I took the opportunity here at NVIDIA’s conference to ask Duncan Poole, who is NVIDIA’s Senior Manager for High Performance Computing and also the President of OpenACC, about what is happening with these two standards. How can Intel implement OpenMP on MIC, if it is not suitable for accelerators?

“I think OpenMP in the form that’s being discussed inside of the sub-committee is suitable. There’s some debate about some of the specific features that continues. Also, in the OpenMP committee they’re trying to address the concerns of TI and IBM so it’s a broader discussion than just the Intel architecture. So OpenMP will be useful on this class of processor. What we needed to do is not wait for it. That standard, if we’re lucky it will be draft at the end of this year, and maybe a year later will be ratified. We want to unify this developer base now,” Poole told me.

How similar will this adapted OpenMP be to what OpenACC is now?

“It’s got the potential to be quite close. The guy that drafted OpenACC is the head of that sub-committee. There’ll probably be changes in keywords, but there’s also some things being proposed now that were not conceived of. So there’s good debate going on, and I expect that we’ll benefit from it.

“Some of the features for example that are shared by Kepler and MIC with respect to nested parallelism are very useful. Nested parallelism did not exist at the time that we started this work. So there’ll be an evolution that will happen and probably a logical convergence over time.

If OpenMP is not set to support acclerators until two years hence, what can Intel be doing with it?

“It will be a vendor implementation of a pre-release standard. Something like that,” said Poole, emphasising that he cannot speak for Intel. “To be complementary to Intel, they have some good ideas and it’s a good debate right now.”

Incidentally, I also asked Intel about OpenACC last month, and was told that the company has no plans to implement it on its compilers. OpenMP is the standard it supports.

The topic is significant, in that if a standard set of directives is supported across both Intel and NVIDIA’s HPC platforms, developers can easily port code from one to the other. You can do this today with OpenCL, but converting an application to use OpenCL to enhance performance is a great deal more effort than adding directives.