NVIDIA CUDA 4.0 simplifies GPU programming, aims for mainstream

NVIDIA has announced CUDA 4.0, a major update to its C++ toolkit for general programming on the GPU. The idea is to take advantage of the many cores of NVIDIA’s GPUs for speeding up tasks that may not be graphic-related.

There are three key features:

Unified Virtual Addressing provides a single address space for the main system RAM and the GPU RAM, or even RAM across multiple GPUs if available. This significantly simplifies programming.


GPUDIRECT 2.0 is NVIDIA’s name for peer-to-peer communication between multiple GPUs on the same computer. Instead of copying objects from one GPU, to main memory, and to a second GPU, the data can go directly.

Thrust C++ template libraries Thrust is a CUDA library which is similar to the parallel algorithms in the C++ Standard Template Library (STL). NVIDIA claims that typical Thrust routines are 5 to 100 times faster than with STL or Intel’s Threading Building Blocks. Thrust is not really new but is getting pushed to the mainstream of CUDA programming.

Other new features include debugging (cuda-gdb) support on Mac OS X, support for new/delete and virtual functions in C++, and improvement to multi-threading.

The common theme of these features is to make it easier for mortals to move from general C/C++  programming to CUDA programming, and to port existing code. This is how NVIDIA sees CUDA progress:


Certainly I see increasing interest in GPU programming, and not just among super-computer researchers.

A weakness is that CUDA only works on NVIDIA GPUs. You can use OpenCL for generic GPU programming but it is less advanced.

CUDA 4.0 release candidate will be available from March 4 if you sign up for the CUDA Registered Developer Program.

Related posts:

  1. Adobe turns to OpenCL rather than NVIDIA CUDA for Mercury Graphics Engine in Creative Suite 6
  2. NVIDIA releases CUDA Toolkit 4.1 with LLVM compiler
  3. Big GPU news at NVIDIA tech conference including first Tegra with CUDA
  4. NVIDIA Nsight comes to Eclipse for Mac, Linux GPU programming
  5. NVIDIA CEO on the spot: explains Fermi delays, CUDA vs OpenCL, rise of the tablet

3 comments to NVIDIA CUDA 4.0 simplifies GPU programming, aims for mainstream

  • Kyle Miller

    I thought CUDA can work with other GPUs but falls back to software emulation when it’s not an nVidia GPU, i.e. runs CUDA on the CPU for the most part. Is this no longer the case?

  • tim

    Sorry, yes you can do that but of course with no benefit.


  • S.A

    Hello I couldnt download cuda toolkit from any site. can any one send me its source