Over 220 registered attendees though it does include a strong showing of vendors. Looks like a room of 190 plus.
Seven corporate sponsors. Three of the supernationals are sponsors. Key names include Keith Gray, Henri Calandara & Ch-- Wong.
Scott Morton.. Hess and his experience on Seismic Algorythms on GPUS
What he wants
- 10x price performance (commiserate with improvement from SC to x86 clusters
- Commodity volumes
- significant parallelism
- "easy to program"
They've looked at a variety of hardware platforms
- Commodity. Not easy to move the algorithms, especially for large data sets
- Worked with SRC in on wave equations algo in 2003
- 10x performance and 10x cost
- Programming is graphical, which doesn't map to skills and tools.
- I wonder what the cost/performance number looks like 5 years later since the FPGA vendors claim much higher than Moore's Law improvement in gates per $.
- Tracking. Believes it is commodity, but haven't dove into it yet.
Hess did work with Peakstream and delivered 5 to 10x speed up in 2D, but only 2 to 3x in 3D for Kirkoff. Once Google bought them, they disappeared. Now he's working with Nvidia Cg.
Comments on CUDA
- Realtively easy to program, but hard to optimzie
- Two day course was useful
- Used on Kirkoff, Reverse-time & Wave Equation algorythms.
- Showing ongoing work... This is not final!
Optimization was on minimizing the data movement between the GPU and CPU with an emphasis on most compute on GPU.
Reverse-time algo is dominated by 3D FFTs. The results are 5x over single CPU, which is 20% faster than a dual node quad core. Notably (and this is really interesting...) 1 quad core and 2 quad cores deliver the same performance.
Wave Equations MIgration is an implicit solver. Prototype performance is 48x 1 3.6GHz Xeon. The core performance flattens at 4 quad core CPUs - 8 is no better.
(These problems are all 32bit and just over a 1GB in size.)
They have ordered a 32 node system. Their theoretical performance of 32 nodes of GPU & dual quad-core should outperform the current 4k Xeon cluster.
Q: What is the hardware platform?
A: The external Telsa boxes connected via PCIe cables.
Q: DO you really think it will meet price/performance targets
A: Yes. but we still need to develop the production code.
Q: Experience with heat density
A: New hardware, so there are no problems so far :)
Q: Are slides going to be available?
A: Yes (I'll add them in the comments)
Q: Why the asymptotic performance on CPU scaling
A: Memory bandwidth
Q: What about IEEE 754 & double precision
A: We don't care about that right now.