I’m at NVIDIA’s GPU tech conference in San Jose. The central theme of the conference is that the capabilities of modern GPUs enable substantial performance gains for general computing, not just for graphics, though most of the examples we have seen involve some element of graphical processing. The reason you should care about this is that the gains are huge.
Take Matlab for example, a popular language and IDE for algorithm development, data analysis and mathematical computation. We were told in the keynote here yesterday that Matlab is offering a parallel computing toolkit based on NVIDIA’s CUDA, with speed-ups from 10 to 40 times. Dramatic performance improvements opens up new possibilities in computing.
Why has GPU performance advanced so rapidly, whereas CPU performance has levelled off? The reason is that they use different computing models. CPUs are general-purpose. The focus is on fast serial computation, executing a single thread as rapidly as possible. Since many applications are largely single-thread, this is what we need, but there are technical barriers to increasing clock speed. Of course multi-core and multi-processor systems are now standard, so we have dual-core or quad-core machines, with big performance gains for multi-threaded applications.
By contrast, GPUs are designed to be massively parallel. A Tesla C1060 has not 2 or 4 or 8 cores, but 240; the C2050 has 448. These are not the same as CPU cores, but nevertheless do execute in parallel. The clock speed is only 1.3Ghz, whereas an Intel Core i7 Extreme is 3.3Ghz, but the Intel CPU has a mere 6 cores. An Intel Xeon 7560 runs at 2.266 Ghz and has 8 cores.The lower clock speed in the GPU is one reason it is more power-efficient.
NVIDIA’s CUDA initiative is about making this capability available to any application. NVIDIA made changes to its hardware to make it more amenable to standard C code, and delivered CUDA C with extensions to support it. In essence it is pretty simple. The extensions let you specify functions to execute on the GPU, allocate memory for pointers on the GPU, and copy memory between the GPU (called the device) and the main memory on the PC (called the host). You can also synchronize threads and use shared memory between threads.
The reward is great performance, but there are several disadvantages. One is the challenge of concurrent programming and the subtle bugs it can introduce.
Another is the hassle of copying memory between host and device. The device is in effect a computer within a computer. Shifting data between the two is relatively show.
A third is that CUDA is proprietary to NVIDIA. If you want your code to work with ATI’s equivalent, called Streams, then you should use the OpenCL library, though I’ve noticed that most people here seem to use CUDA; I presume they are able to specify the hardware and would rather avoid the compromises of a cross-GPU library. In the worst case, if you need to support both CUDA and non-CUDA systems, you might need to support different code paths depending on what is detected at runtime.
It is all a bit messy, though there are tools and libraries to simplify the task. For example, this morning we heard about GMAC, which makes host and device appear to use a single address space, though I imagine there are performance implications.
NVIDIA says it is democratizing supercomputing, bringing high performance computing within reach for almost anyone. There is something in that; but at the same time as a developer I would rather not think about whether my code will execute on the CPU or the GPU. Viewed at the highest level, I find it disappointing that to get great performance I need to bolster the capabilities of the CPU with a specialist add-on. The triumph of the GPU is in a sense the failure of the CPU. Convergence in some form or other strikes me as inevitable.
5 thoughts on “Is the triumph of the GPU the failure of the CPU?”
It’s not a failure of the CPU, it’s just making better use of a specialized resource. Graphics are really just a series of mathematical transformations and calculations, typically on relatively limited datasets (texture sizes notwithstanding). If your problem also happens to look like this (lot of math, relatively little data), then GPU computing is a natural fit and you’ll see huge speed gains. For anything else, you’ll likely stick with a normal CPU, as it better handles non-mathematical processing and has a more evenly matched processing-to-data ratio than a GPU.
Different tools for different tasks. Just because they invent a really awesome velcro design doesn’t mean I’m tossing screws, nails, and glue.
@Joshua Thanks for the comment. Of course we need both CPU and GPU; they are good at different types of processing. The point I’m making is twofold: first, that the pace of performance improvement has been greater with the GPU (and looks to continue that way); and second that some level of convergence is needed to simplify development and improve communication between the two. I’m with the Tesla guys right now and they seem to agree: “a model of computing that combines these two will emerge”. There’s also NVIDIA’s Tegra which combines CPU and GPU, albeit aimed at the mobile market.
AMD is working on convergence between the GPU and the CPU, and so is Intel. IT’s pretty much inevitable that GPU and CPU will become one piece of silicon (Apple’s A4, apparently, already has this convergence).
However, the fallacy is the assumption that performance still matters. Outside of the gaming space, and specialized software (like AutoCAd, or Matlab, or climate models, or so), performance doesn’t matter. A low-end computer today is more than capable to fill all the computing needs that you, I or a business have.
That Intel and AMD still focus on performance in their CPUs is, well, more a lack of other selling points and differentiation between the two. Most of the time, my laptop’s Core2Duo is sitting idle, and at a very low clock-speed, no matter if I use Word, Outlook, or Chrome, or am watching a movie.
Where the GPU and CUDA is finding its place is supercomputing, hands down. It’s cheaper, and its stuff that can be parallelized with relative ease, anyway.
While performance, as you point out, is not as important as it once was, its narrow minded to say it doesn’t matter.
The more processing power we have, the more people will find a use for it. I doubt Pixar would agree that chasing ever more powerful hardware is a waste of time.
Even for Joe Average, as CPUs have gotten faster developers have gotten lazier. So if you find your CPU is sat idle then you are very lucky, you must not browse web pages with flash installed as I find that can STILL bring a decent PC to its knees.
Of course if everything becomes optimised tomorrow so we need less CPU power to do things, what would push the market forward? Its a lot harder to sell a new computer based on it using a few less watts than it is to sell one because its uber fast compared to what you have right now. Even if sometimes that speed increase is more down to having a new hard drive than actual processing speed improvements.
The GPU does not have hunderds of cores. Instead it has a few cores, with very wide vector processing units.
So it’s incorrect to say that a C2050 has 448 cores while a high-end CPU only has 8 cores. The CPU has vector processing units too. For instance a mainstream Core i5 2500 can perform 64 single-precision floating-point operations per cycle. Together with a higher clock speed, the performance gap between the CPU and GPU is much smaller than one might think. And because the CPU can handle very complex code it’s still the architecture of choice for anything other than graphics.
But that might eventually change too. The number of cores is still increasing, the vector processing units become more powerful, and we’ll see gather/scatter instruction support at some point. This gives the CPU the same capabilities as the GPU. Software renderers like SwiftShader can take over the task of dedicated graphics chips, starting with IGPs and working its way up from there…
This opens up new possibilities, since the CPU exploits a balanced mix of ILP, DLP and TLP developers get a lot more freedom to create whatever they want, instead of being limited by what the GPU can run efficiently.
Comments are closed.