nVidia demos CUDA

Ace Pace · Post by **Ace Pace** » 2007-02-16 04:13pm

What's CUDA?

NVIDIA CUDA stands for "Complete Unified Device Architecture" and represents their new software and hardware solution for what they call "GPU Computing," or general-purpose computation on a GPU. It is based on the G80 architecture, and will be compatible with all future G8x derivatives and even NVIDIA's next-generation architectures.

GPGPU solutions are primarily aimed at the scientific, professional and financial markets, although it might also eventually have an impact on the PC gaming market too, as well as for other non-gaming applications. It can fundamentally be defined as using a GPU for anything else other real-time image rendering and it could, financially speaking, become a very large market.

From the software side of things, CUDA is exposed through extensions to the ANSI C language, with only a minimal set of deviations (denormals, recursive functions, etc.). On the hardware side of things, it simply exposes NVIDIA's DX10 hardware with two key features: a parallel data cache and shared memory. Both help improve efficiency beyond that of current GPGPU solutions. More on that in a second.

It should also be said that NVIDIA CUDA and AMD CTM ("Close To the Metal") are both very similar, but also very different. The former is directly aimed at developers, and exposes a C-like programming language. On the other hand, AMD CTM corresponds to the assembly language interface for low-level access to the ATI R5xx product family. As such, CUDA and CTM are not directly comparable without an appropriate backend for CTM. And, sadly, we are not aware of any proper and fully-featured public backend for CTM at this point in time.

Why CUDA?

There are three big disadvantages to using OpenGL and Direct3D for GPGPU development. The first, and most obvious one, is that these APIs are made with rendering in mind, not GPGPU programming. As such, they might be less efficient and a lot less straightforward to use for such workloads. Secondly, new drivers might introduce bugs which could significantly affect general-purpose programs, even more so than rendering. And finally, no modern rendering API exposes direct and arbitrary read/write access to video memory.

Both CUDA and CTM fix all of these issues, and very nicely at that. Furthermore, they can achieve very high efficiency at fundamentally "stream-like" workloads, which is what SIMD processors have traditionally been good at. They an also achieve much higher performance than CPUs could ever dream of for such massively parallel computations.

Generally speaking, CPUs are good at single-threaded workloads, although they have recently been adding a small level of thread parallelism with HyperThreading and multi-core chips. GPUs, on the other hand, are inherently massively parallel: there are literally thousands of threads in flight on the GPU at any given time. That's why it's also very unlikely your GPU would be very good at word processing - there's not much parallelism to extract there.

NVIDIA aims to further differentiate and distance themselves from CTM by going above and beyond what traditional SIMD machines however, potentially gaining greater efficiency for certain workloads.

CUDA Hardware Model

With CUDA, introduces "shared memory" (also known as the "parallel data cache") which corresponds to a pool of shared memory for every multiprocessor (ALU cluster; every processor is basically an ALU with little control logic of its own).

You can think of shared memory as pretty much the same thing as a miniature version of the local store on the Cell architecture. It's basically a manually managed cache which lets a clever programmer significantly reduce the number of memory bandwidth his or her program needs.

Furthermore, CUDA introduces inter-thread synchronization. Mostly any thread running on the same multiprocessor can synchronize with another thread running there, which allows for a number of algorithms to run on the GPU with some much nicer performance characteristics, in theory. And in practice too, as far as we can see. Overall, the hardware model is very easy to program for, and quite efficient too.

The paradigm is also fundamentally different from that of current CPUs, which are single-threaded and memory latency-intolerant. Thus, they hide latency with large caches. GPUs, on the other hand, are massively multithreaded and memory latency-tolerant; they simply hide latency by switching to another execution thread. This is a fundamental difference, and some researchers and engineers are already predicting that it is fundamentally impossible to create a single architecture that is well suited to both kinds of workloads.

Conclusion

In the future, an increasingly high proportion of the GPU's transistors will be dedicated to arithmetic processing, rather than texturing or rasterization. As such, there is tremendous potential for them to improve GPGPU performance significantly faster than Moore's Law in the next few years.

Furthermore, new features such as double precision processing (at quarter-speed or below, but still with very impressive raw performance) will extend the reach of the GPGPU market further inside the realm of scientific supercomputing. Needless to say, the future of GPU Computing is very exciting, and extremely promising.

While many applications and workloads are NOT suitable to CUDA or CTM, because they are inherently not massively parallel, the number of potential applications for the technology remains incredibly high. Much of that market is also what has historically been part of the server CPU market, which has very high margins and is very lucrative. So, unsurprisingly, they're quite financially excited to enter that market. The goodies will hopefully benefit PC consumers just as must as scientists and researchers in the longer-term, however.

A rather more indepth look.

MKSheppard · Post by **MKSheppard** » 2007-02-16 08:17pm

English, please. How will this change anything, or will it be another MMX?

The Kernel · Post by **The Kernel** » 2007-02-17 12:06am

MKSheppard wrote:English, please. How will this change anything, or will it be another MMX?

The point of this article is that GPU's are massive parallel floating point monsters that can be utilized for other applications beyond graphics processing due to the advent of more flexible pixel processing units.

The net effect of this is that with properly written software, certain applications that are friendly towards threading and parallelization are going to see orders of magnitude increases in performance due to using these unused GPU resources.

In essence it's very similar to what IBM/Sony had in mind with the Cell processor (although it was silly to put in a game console, but I digress), except that it is using existing hardware.

Master of Cards · Post by **Master of Cards** » 2007-02-17 11:33am

The Kernel wrote:
MKSheppard wrote:English, please. How will this change anything, or will it be another MMX?
The point of this article is that GPU's are massive parallel floating point monsters that can be utilized for other applications beyond graphics processing due to the advent of more flexible pixel processing units.

The net effect of this is that with properly written software, certain applications that are friendly towards threading and parallelization are going to see orders of magnitude increases in performance due to using these unused GPU resources.

In essence it's very similar to what IBM/Sony had in mind with the Cell processor (although it was silly to put in a game console, but I digress), except that it is using existing hardware.

3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.

Ace Pace · Post by **Ace Pace** » 2007-02-17 12:02pm

Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.

English, please. How will this change anything,

Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.

Arrow · Post by **Arrow** » 2007-02-17 02:27pm

CUDA looks like a good, fast and cheap way to do DSP calculations. It won't compete with the real time power found in FPGAs or ASICs, but it could give traditional DSP chips a run for the money (having never used a DSP processor or CUDA, I can't tell either way).

But the obvious application for CUDA is physics, probably some forms of AI, and audio (massive overkill for audio, though) for games. I can also see researchers using it for cheaply crunching through their models.

Ace Pace · Post by **Ace Pace** » 2007-02-17 02:34pm

Arrow wrote:CUDA looks like a good, fast and cheap way to do DSP calculations. It won't compete with the real time power found in FPGAs or ASICs, but it could give traditional DSP chips a run for the money (having never used a DSP processor or CUDA, I can't tell either way).

But the obvious application for CUDA is physics, probably some forms of AI, and audio (massive overkill for audio, though) for games. I can also see researchers using it for cheaply crunching through their models.

The question is can we get CUDA and a DirectX program running at the same time? Then 8800GTX SLIed might actully have a use besides being a gigantic E-penis.

salm · Post by **salm** » 2007-02-17 02:40pm

Ace Pace wrote:
Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.

English, please. How will this change anything,
Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.

He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.

Ace Pace · Post by **Ace Pace** » 2007-02-17 02:44pm

He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.

From what i know, the QuadroFX chips and such can work together with some 3D rendering programs to significently speed up rendering.

And on that note:

No one should need help reading this.

salm · Post by **salm** » 2007-02-17 02:54pm

Ace Pace wrote: From what i know, the QuadroFX chips and such can work together with some 3D rendering programs to significently speed up rendering.

Interesting. I never heard of that.

Anyway, that wouldn´t change anything that this CUDA stuff cranks up the speed of renderning.

And on that note:

<picturesnip>

No one should need help reading this.

Yeah, i can read it but i have no idea what it means.

Ace Pace · Post by **Ace Pace** » 2007-02-17 03:16pm

Oh, right, caption.

Thats the 8800GTX with CUDA versus a C2D 2.6GHZ. Now it's clearer?

salm · Post by **salm** » 2007-02-17 03:35pm

Ace Pace wrote:Oh, right, caption.

Thats the 8800GTX with CUDA versus a C2D 2.6GHZ. Now it's clearer?

Nope. The person with average knowledge about graphic cards, like myself, has no clue what terms like "Matrix Numerics, Wave Equasion and Biological Sequence match" mean. I guess the higher the bar in the picture the better, but that´s about all i can derieve from this image.

The Kernel · Post by **The Kernel** » 2007-02-17 04:31pm

People what Ace is trying to say is that CUDA isn't about gaming or graphics at all. It's about speeding up anything that has two things:

1) A lot of floating point (decimal) math.
2) A lot of calculations which are not dependent on each other (in other words very easy to parallelize).

To put this in perspective, CPU's are extremely generalized processors that are not very fast at any given thing, but can do any kind of computation you throw at them. GPU's on the other hand are extremely specialized math processors that only do a few kind of equations at all, but the ones it does it can do hundreds of times faster than a CPU.

Master of Ossus · Post by **Master of Ossus** » 2007-02-17 04:52pm

salm wrote:Nope. The person with average knowledge about graphic cards, like myself, has no clue what terms like "Matrix Numerics, Wave Equasion and Biological Sequence match" mean. I guess the higher the bar in the picture the better, but that´s about all i can derieve from this image.

Matrix operations (and vector ones) are critically important in such fields as statistics (and related ones like economics), physics processing, etc. The height of the bar is the multiplier, so for instance from the chart matrix operations are ten times faster using the 8800 under CUDA than they would be if done by the CPU they're comparing it with.

Arrow · Post by **Arrow** » 2007-02-17 05:48pm

salm wrote:
Ace Pace wrote:
Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.

English, please. How will this change anything,
Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.

Render farm applications, correct?

salm · Post by **salm** » 2007-02-17 05:50pm

Oh, i thought all that stuff had some special meaning regarding graphic cards but now i see that they have their regular meaning.
Meh, brainfart.

salm · Post by **salm** » 2007-02-17 05:53pm

Arrow wrote:
salm wrote:
Ace Pace wrote: Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.
Render farm applications, correct?

Well, any program that renders your 3d model to a complete image. Usually you can use these programms within a renderfarm but i don´t think that being in a renderfarm somehow requires a significant amount of cpu speed.

phongn · Post by **phongn** » 2007-02-20 10:46am

The Kernel wrote:People what Ace is trying to say is that CUDA isn't about gaming or graphics at all. It's about speeding up anything that has two things:

1) A lot of floating point (decimal) math.

This is a bit of a bump, but can CUDA actually do decimal math instead of floating-point?

Sam Or I · Post by **Sam Or I** » 2007-02-20 04:24pm

It needs a Hemi.

Master of Cards · Post by **Master of Cards** » 2007-02-20 06:07pm

Salm try making in Max 300 boxs in reactor and drop them. see how long it takes to create the animation, if the floating point was openned up it colud go faster, some gpus have drivers to do this