nVidia demos CUDA

GEC: Discuss gaming, computers and electronics and venture into the bizarre world of STGODs.

Moderator: Thanas

Post Reply
User avatar
Ace Pace
Hardware Lover
Posts: 8456
Joined: 2002-07-07 03:04am
Location: Wasting time instead of money
Contact:

nVidia demos CUDA

Post by Ace Pace »

Beyond3d, who else?
What's CUDA?



NVIDIA CUDA stands for "Complete Unified Device Architecture" and represents their new software and hardware solution for what they call "GPU Computing," or general-purpose computation on a GPU. It is based on the G80 architecture, and will be compatible with all future G8x derivatives and even NVIDIA's next-generation architectures.

GPGPU solutions are primarily aimed at the scientific, professional and financial markets, although it might also eventually have an impact on the PC gaming market too, as well as for other non-gaming applications. It can fundamentally be defined as using a GPU for anything else other real-time image rendering and it could, financially speaking, become a very large market.

From the software side of things, CUDA is exposed through extensions to the ANSI C language, with only a minimal set of deviations (denormals, recursive functions, etc.). On the hardware side of things, it simply exposes NVIDIA's DX10 hardware with two key features: a parallel data cache and shared memory. Both help improve efficiency beyond that of current GPGPU solutions. More on that in a second.

It should also be said that NVIDIA CUDA and AMD CTM ("Close To the Metal") are both very similar, but also very different. The former is directly aimed at developers, and exposes a C-like programming language. On the other hand, AMD CTM corresponds to the assembly language interface for low-level access to the ATI R5xx product family. As such, CUDA and CTM are not directly comparable without an appropriate backend for CTM. And, sadly, we are not aware of any proper and fully-featured public backend for CTM at this point in time.

Why CUDA?



There are three big disadvantages to using OpenGL and Direct3D for GPGPU development. The first, and most obvious one, is that these APIs are made with rendering in mind, not GPGPU programming. As such, they might be less efficient and a lot less straightforward to use for such workloads. Secondly, new drivers might introduce bugs which could significantly affect general-purpose programs, even more so than rendering. And finally, no modern rendering API exposes direct and arbitrary read/write access to video memory.

Both CUDA and CTM fix all of these issues, and very nicely at that. Furthermore, they can achieve very high efficiency at fundamentally "stream-like" workloads, which is what SIMD processors have traditionally been good at. They an also achieve much higher performance than CPUs could ever dream of for such massively parallel computations.

Generally speaking, CPUs are good at single-threaded workloads, although they have recently been adding a small level of thread parallelism with HyperThreading and multi-core chips. GPUs, on the other hand, are inherently massively parallel: there are literally thousands of threads in flight on the GPU at any given time. That's why it's also very unlikely your GPU would be very good at word processing - there's not much parallelism to extract there.

NVIDIA aims to further differentiate and distance themselves from CTM by going above and beyond what traditional SIMD machines however, potentially gaining greater efficiency for certain workloads.

CUDA Hardware Model

Image

With CUDA, introduces "shared memory" (also known as the "parallel data cache") which corresponds to a pool of shared memory for every multiprocessor (ALU cluster; every processor is basically an ALU with little control logic of its own).

You can think of shared memory as pretty much the same thing as a miniature version of the local store on the Cell architecture. It's basically a manually managed cache which lets a clever programmer significantly reduce the number of memory bandwidth his or her program needs.

Furthermore, CUDA introduces inter-thread synchronization. Mostly any thread running on the same multiprocessor can synchronize with another thread running there, which allows for a number of algorithms to run on the GPU with some much nicer performance characteristics, in theory. And in practice too, as far as we can see. Overall, the hardware model is very easy to program for, and quite efficient too.

The paradigm is also fundamentally different from that of current CPUs, which are single-threaded and memory latency-intolerant. Thus, they hide latency with large caches. GPUs, on the other hand, are massively multithreaded and memory latency-tolerant; they simply hide latency by switching to another execution thread. This is a fundamental difference, and some researchers and engineers are already predicting that it is fundamentally impossible to create a single architecture that is well suited to both kinds of workloads.

Conclusion



In the future, an increasingly high proportion of the GPU's transistors will be dedicated to arithmetic processing, rather than texturing or rasterization. As such, there is tremendous potential for them to improve GPGPU performance significantly faster than Moore's Law in the next few years.

Furthermore, new features such as double precision processing (at quarter-speed or below, but still with very impressive raw performance) will extend the reach of the GPGPU market further inside the realm of scientific supercomputing. Needless to say, the future of GPU Computing is very exciting, and extremely promising.

While many applications and workloads are NOT suitable to CUDA or CTM, because they are inherently not massively parallel, the number of potential applications for the technology remains incredibly high. Much of that market is also what has historically been part of the server CPU market, which has very high margins and is very lucrative. So, unsurprisingly, they're quite financially excited to enter that market. The goodies will hopefully benefit PC consumers just as must as scientists and researchers in the longer-term, however.
A rather more indepth look.
Brotherhood of the Bear | HAB | Mess | SDnet archivist |
User avatar
MKSheppard
Ruthless Genocidal Warmonger
Ruthless Genocidal Warmonger
Posts: 29842
Joined: 2002-07-06 06:34pm

Post by MKSheppard »

English, please. How will this change anything, or will it be another MMX?
"If scientists and inventors who develop disease cures and useful technologies don't get lifetime royalties, I'd like to know what fucking rationale you have for some guy getting lifetime royalties for writing an episode of Full House." - Mike Wong

"The present air situation in the Pacific is entirely the result of fighting a fifth rate air power." - U.S. Navy Memo - 24 July 1944
User avatar
The Kernel
Emperor's Hand
Posts: 7438
Joined: 2003-09-17 02:31am
Location: Kweh?!

Post by The Kernel »

MKSheppard wrote:English, please. How will this change anything, or will it be another MMX?
The point of this article is that GPU's are massive parallel floating point monsters that can be utilized for other applications beyond graphics processing due to the advent of more flexible pixel processing units.

The net effect of this is that with properly written software, certain applications that are friendly towards threading and parallelization are going to see orders of magnitude increases in performance due to using these unused GPU resources.

In essence it's very similar to what IBM/Sony had in mind with the Cell processor (although it was silly to put in a game console, but I digress), except that it is using existing hardware.
User avatar
Master of Cards
Jedi Master
Posts: 1168
Joined: 2005-03-06 10:54am

Post by Master of Cards »

The Kernel wrote:
MKSheppard wrote:English, please. How will this change anything, or will it be another MMX?
The point of this article is that GPU's are massive parallel floating point monsters that can be utilized for other applications beyond graphics processing due to the advent of more flexible pixel processing units.

The net effect of this is that with properly written software, certain applications that are friendly towards threading and parallelization are going to see orders of magnitude increases in performance due to using these unused GPU resources.

In essence it's very similar to what IBM/Sony had in mind with the Cell processor (although it was silly to put in a game console, but I digress), except that it is using existing hardware.
3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.
User avatar
Ace Pace
Hardware Lover
Posts: 8456
Joined: 2002-07-07 03:04am
Location: Wasting time instead of money
Contact:

Post by Ace Pace »

Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.


English, please. How will this change anything,
Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
Brotherhood of the Bear | HAB | Mess | SDnet archivist |
User avatar
Arrow
Jedi Council Member
Posts: 2283
Joined: 2003-01-12 09:14pm

Post by Arrow »

CUDA looks like a good, fast and cheap way to do DSP calculations. It won't compete with the real time power found in FPGAs or ASICs, but it could give traditional DSP chips a run for the money (having never used a DSP processor or CUDA, I can't tell either way).

But the obvious application for CUDA is physics, probably some forms of AI, and audio (massive overkill for audio, though) for games. I can also see researchers using it for cheaply crunching through their models.
User avatar
Ace Pace
Hardware Lover
Posts: 8456
Joined: 2002-07-07 03:04am
Location: Wasting time instead of money
Contact:

Post by Ace Pace »

Arrow wrote:CUDA looks like a good, fast and cheap way to do DSP calculations. It won't compete with the real time power found in FPGAs or ASICs, but it could give traditional DSP chips a run for the money (having never used a DSP processor or CUDA, I can't tell either way).

But the obvious application for CUDA is physics, probably some forms of AI, and audio (massive overkill for audio, though) for games. I can also see researchers using it for cheaply crunching through their models.
The question is can we get CUDA and a DirectX program running at the same time? Then 8800GTX SLIed might actully have a use besides being a gigantic E-penis. :D
Brotherhood of the Bear | HAB | Mess | SDnet archivist |
User avatar
salm
Rabid Monkey
Posts: 10296
Joined: 2002-09-09 08:25pm

Post by salm »

Ace Pace wrote:
Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.


English, please. How will this change anything,
Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.
User avatar
Ace Pace
Hardware Lover
Posts: 8456
Joined: 2002-07-07 03:04am
Location: Wasting time instead of money
Contact:

Post by Ace Pace »




He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.
From what i know, the QuadroFX chips and such can work together with some 3D rendering programs to significently speed up rendering.

And on that note:


Image

No one should need help reading this.
Brotherhood of the Bear | HAB | Mess | SDnet archivist |
User avatar
salm
Rabid Monkey
Posts: 10296
Joined: 2002-09-09 08:25pm

Post by salm »

Ace Pace wrote: From what i know, the QuadroFX chips and such can work together with some 3D rendering programs to significently speed up rendering.
Interesting. I never heard of that.

Anyway, that wouldn´t change anything that this CUDA stuff cranks up the speed of renderning.

And on that note:

<picturesnip>

No one should need help reading this.
Yeah, i can read it but i have no idea what it means. :wink:
User avatar
Ace Pace
Hardware Lover
Posts: 8456
Joined: 2002-07-07 03:04am
Location: Wasting time instead of money
Contact:

Post by Ace Pace »

Oh, right, caption.

Thats the 8800GTX with CUDA versus a C2D 2.6GHZ. Now it's clearer?
Brotherhood of the Bear | HAB | Mess | SDnet archivist |
User avatar
salm
Rabid Monkey
Posts: 10296
Joined: 2002-09-09 08:25pm

Post by salm »

Ace Pace wrote:Oh, right, caption.

Thats the 8800GTX with CUDA versus a C2D 2.6GHZ. Now it's clearer?
Nope. The person with average knowledge about graphic cards, like myself, has no clue what terms like "Matrix Numerics, Wave Equasion and Biological Sequence match" mean. I guess the higher the bar in the picture the better, but that´s about all i can derieve from this image.
User avatar
The Kernel
Emperor's Hand
Posts: 7438
Joined: 2003-09-17 02:31am
Location: Kweh?!

Post by The Kernel »

People what Ace is trying to say is that CUDA isn't about gaming or graphics at all. It's about speeding up anything that has two things:

1) A lot of floating point (decimal) math.
2) A lot of calculations which are not dependent on each other (in other words very easy to parallelize).

To put this in perspective, CPU's are extremely generalized processors that are not very fast at any given thing, but can do any kind of computation you throw at them. GPU's on the other hand are extremely specialized math processors that only do a few kind of equations at all, but the ones it does it can do hundreds of times faster than a CPU.
User avatar
Master of Ossus
Darkest Knight
Posts: 18213
Joined: 2002-07-11 01:35am
Location: California

Post by Master of Ossus »

salm wrote:Nope. The person with average knowledge about graphic cards, like myself, has no clue what terms like "Matrix Numerics, Wave Equasion and Biological Sequence match" mean. I guess the higher the bar in the picture the better, but that´s about all i can derieve from this image.
Matrix operations (and vector ones) are critically important in such fields as statistics (and related ones like economics), physics processing, etc. The height of the bar is the multiplier, so for instance from the chart matrix operations are ten times faster using the 8800 under CUDA than they would be if done by the CPU they're comparing it with.
"Sometimes I think you WANT us to fail." "Shut up, just shut up!" -Two Guys from Kabul

Latinum Star Recipient; Hacker's Cross Award Winner

"one soler flar can vapririze the planit or malt the nickl in lass than millasacit" -Bagara1000

"Happiness is just a Flaming Moe away."
User avatar
Arrow
Jedi Council Member
Posts: 2283
Joined: 2003-01-12 09:14pm

Post by Arrow »

salm wrote:
Ace Pace wrote:
Master of Cards wrote:3d rendering where this will be very useful. It could speed up reactor calcs, lower render times and many other programs so renders don't take 25 hours.


English, please. How will this change anything,
Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.
Render farm applications, correct?
User avatar
salm
Rabid Monkey
Posts: 10296
Joined: 2002-09-09 08:25pm

Post by salm »

Oh, i thought all that stuff had some special meaning regarding graphic cards but now i see that they have their regular meaning.
Meh, brainfart.
User avatar
salm
Rabid Monkey
Posts: 10296
Joined: 2002-09-09 08:25pm

Post by salm »

Arrow wrote:
salm wrote:
Ace Pace wrote: Jesus christ people read the article.

This has NOTHING to do about 3D rendering and everything about enabling programmers to use a gigantic parralel floating point processor.

We already have things which speed up 3D rendering. They are called professional workstation GPUs, QuadroFX and the like.
He´s not talking about realtime rendering but about rendering an image from a 3D program. This process requires no GPU speed, only CPU speed. And if i understand correctly this new GPU could support the CPU which means that this does make rendering images faster.
Render farm applications, correct?
Well, any program that renders your 3d model to a complete image. Usually you can use these programms within a renderfarm but i don´t think that being in a renderfarm somehow requires a significant amount of cpu speed.
User avatar
phongn
Rebel Leader
Posts: 18487
Joined: 2002-07-03 11:11pm

Post by phongn »

The Kernel wrote:People what Ace is trying to say is that CUDA isn't about gaming or graphics at all. It's about speeding up anything that has two things:

1) A lot of floating point (decimal) math.
This is a bit of a bump, but can CUDA actually do decimal math instead of floating-point?
User avatar
Sam Or I
Jedi Council Member
Posts: 1894
Joined: 2002-07-12 12:57am
Contact:

Post by Sam Or I »

It needs a Hemi.
User avatar
Master of Cards
Jedi Master
Posts: 1168
Joined: 2005-03-06 10:54am

Post by Master of Cards »

Salm try making in Max 300 boxs in reactor and drop them. see how long it takes to create the animation, if the floating point was openned up it colud go faster, some gpus have drivers to do this
Post Reply