March 22, 2012

Kepler is *NOT* the GPGPU wonder that nVidia announced!

Today the GTX680 has been unveiled by nVidia, and it’s a mixed bag: good game performance (bad performance/price or performance/watt ratio) and incredibly poor GPGPU performance on OpenCL, not talking about laughable DP floating-point performance!

The new nVidia architecture, code-named Kepler, should have been on your desk one quarter ago, but nVidia again and again could not deliver what they promised when they promised. What nVidia promised is a new GPGPU architecture that will liteeraly crush AMD Radeon GPU as well as Fermi architecture. What we encounter today is exactly the inverse, a new architecture that is correct for games, but is incredibly bad for GPGPU computing!

nVidia presented few details of the Kepler architecture, and while AMD Radeon was going from SIMD VLIW5 to VLIW4 and finally a simple SIMD model on the new GCN architecture, nVidia decided to step back from it’s move to 32-core grouped in warp then 16-core grouped in half-warp, to logically grouping 48-cuda core together. While Fermi groups of 48-core have 64KB of shared memory/cache, Kepler groups 192-core (4X more) have only access to the same 64KB for shared memory/cache, available register number is 2X lower. Kepler doesn’t improve over Fermi, it’s just inefficient, putting more pressure on a slow memory bus (compared to actual Radeon 7970), and making divergence troublesome.

Kepler GPGPU performance

Sandra 2012 GPGPU benchmark see Radeon HD7970 (4230) being 40% faster than GTX 680 (3000). And the Radeon 7870 doing same score! If you go Double-Precision, HD7870 is 20% faster than GTX 680 and HD7970 is 5.4X faster!

Same problem with LuxMark GPU, where GTX690 scored 284 and HD7970 784, 3.4X faster! Not having same scene on HD7870 we don’t know it’s score, but guess it should also beat the GTX 680!

AMD decided to unleash the full power in DP for Radeon HD 79xx, with a 1:4 ratio on DP:SP, to focus more on SP performance on HD 78xx with 1:8 ratio, and kept the DP units on the HD77xx while offering a 1:16 ratio gthat is enough to validate or debug OpenCL DP software but not to offer HPC. nVidia on the other side limited it’s flagship GTX680 to a 1:16 ratio, to offer expensive Tesla unit for DP GPGPU computation.


In fact, you could not compare the $599 GeForce GTX680 to the $549 Radeon HD7970 for GPGPU computing, but depending on the benchmark, it’s between the $449 HD7950 and the $349 HD7870 (and even under it!), so you’d better consider OpenCL and the new AMD Radeon GCN architecture for high-performance computing, for at least 1 year!

*** UPDATED *** To correct an error on the register file size and cache size, thanks to srdja!

