Friday, February 10, 2012

Conflicted about HSAs....

I am having a very mixed reaction to the recent public announcements of AMD's Heterogeneous System Architecture (HSA) roadmaps. As anyone who stumbles across this blog should know, I did a lot of work in that area. I really do (or is it "did"?) think it represents a real future of computing silicon. It should reduce overall power and is the roadmap to create truly powerful single-chip computers that do everything. Building hotter FPUs or more cache or more generic cores is a race of diminishing returns.

We already have significant sections of modern x86 CPUs dedicated to special purpose functions. Vector instructions with SSE started us down the path, but now we have instruction sets and silicon that speed encryption and other low level functions. The advantages of this model is a single instruction set and, most importantly, a flat memory space.

It is the overhead of memory copy that actually eats up most of the advantage of dedicated silicon. The problem has to be large enough to be worth shipping around the data. Phil Rogers, Mark Hummel & others at AMD know this well. But so do the teams at NVidia, Intel and every other silicon company in the world.

Unfortunately, the silicon industry has also shown stamping out small power-efficient general purpose cores is easy. The difference in power consumption between on-load dedicated cores as NIC and purpose built silicon is shrinking. However, the design overhead of developing that silicon has not. There are only a handful of compute problems worth solving in silicon, but there should be 1000s of compute functions to which it is worth dedicating a core.

AMD has an approach to coherent memory across the different silicon environments. I know enough of the people involved to be confident the solution is elegant and functional. I have deep confidence that it can be game changing in the HPC space where uber-FLOPS still matter and adoption is a matter of compiling in the libraries.

Unfortunately, I don't think it is industry changing. Current software trends are not toward getting more from a single program, but dividing up the problem into smaller, general purpose compute elements. Software architects realized it was too hard to copy the data, so they moved the compute to the data. Yes, I'm talking about Map/Reduce.

AMD's HSA and Map/Reduce represent two directions for software's use of silicon. In my opinion, the need to speed up specific algorithms has been circumvented already by software architecture. AMD is shooting where the duck was, not where it is going. That is the problem with silicon engineering in the current age. It doesn't move fast enough.

That's why Dan Reed, Burton Smith and others are working for MSFT these days and Peter Ungaro is giving the keynote at a Big Data conference. They are trying to shoot ahead of the duck.

If AMD can improve memory efficiency (e.g. garbage collection) and/or messaging primitives (e.g. collectives) they may not change the industry, but you certainly have a competitive advantage in the modern age of distributed software architectures.

1 comment:

Brock Palen said...

I will be curious to see how the memory bandwidth works out. Even in the cases where GPU's have worked well for us it has not been because of all the cores, but from the massive memory bandwidth.

When the Bulldozer chip was announced (Interlagos) I liked the idea because I already knew most of our apps didn't see much speedup over half the cores in the box. Which was about the memory bandwidth of the machine.

Most users when they say they want the new hot chip, I tell them no you want the new hot platform with the extra memory channel and new DDR10001 memory. You were only using half the flops in your cores anyway.