Results

Kyle maintains a results repository on the Google Public Data Explorer here

Notes on these results:

When no result was measured (for instance if you try to run a double precision test on ION), a zero is shown.
The CPU is executing OpenCL code which was written for GPUs. This helps us get an idea of performance portability, but, in practice, does not represent the highest attainable performance for the CPU. The same is true for non-NV GPUs for FFT and GEMM (these kernels are based on the highly tuned version by Vasily Volkov).
Measurements which include PCIe time are not comparable across GPUs, since PCIe time is much more dependent on the CPU/Chipset than the GPU.
I don’t have a GTX580 or GTX480 in house, so results for those devices are limited to CUDA 3.2.
Results are shown for the largest problem a device could handle in a reasonable amount of time (so 4 for discrete GPUs & Sandy Bridge, 3 for ION and Nehalem)
Another thing is that GPDE requires you have data that spans time. Well, obviously some of the characteristics (like number of cores in a GPU) don’t change. So, for now, I have faked some time values. I’m still debating on what’s best to do here. Eventually, I’d like to have the nice scatter chart animation for CUDA/OpenCL over time, but for now it’s more useful for me to see the discrete values at different releases.
The Stencil benchmark changed units (from seconds to GFLOPS), so some jumps are larger than they should be. All results pre NV 4.1, Intel 1.5, and AMD 2.6 are in seconds. Those and newer results are GFLOPS.

Provide feedback