Wednesday, March 26, 2008

Observations from HPCC ’08

Newport, RI March 25-26, 2008

This is my first blog entry, ever….really. Before I talk about the conference I’ll share some credentials. I’ve worked in the high performance computing market for almost ten years. Prior to that, I held various positions range from commercial software, networking, semiconductors to workstations. My roles were always in technical marketing or business development except a few short stints in sales….which is good for keeping one’s ego in check.

I would say the HPC market is probably the most interesting segment I’ve ever touched. I’m not saying that because it is where I am now….I say that because it is WHY I am where I am now. The people are interesting, their work is fascinating and with the exception of a few players (and you know who you are) they are very pleasant. It is a familiar crowd. As a side observation is that we need to start attracting more young people into applied science. I’ll save comments on that for another entry.

Enough said….on with the conference observations. I’m going to break my discussion into two sections. The first will deal with the general content of the presentations and the second will cover the business challenges of the market as presented by Dr. Stephen Wheat, Sr. Director in Intel’s HPC group.

Interesting science, interesting technologies…

A number of presenters from the national labs and academia presented work they were doing with some overview of the science. I found the audience generally attentive. Presentation problem statements were broad enough that listeners could see if the approach applied to their area of interest. Frankly for me some of the science was over my head...but it was still worthwhile.

John Grosh, LLNL had the best quote of the day. It was, “The right answer is always the obvious one, once you find it!” So true in life and in science. Among other things, he described their biggest challenge as the application of computing technology to large scale predictive simulations. As it was explained, massive simulations must present “a result” and include a quantified margin of error from the simulation. Quantifying the margin of error or uncertainty requires lots of data points. This has implications for the size of the data set, which places a load on memory subsystems, file systems, underlying hardware and the management and reliability of a complex computing system.

At the end of his presentation I was struck by the complexity of quantifying margin of error. I see at least three factors that could contribute to the uncertainty. They are:
- Model uncertainty, based on the predictive validity of the model itself.
- Platform uncertainty associated with the accuracy and predictability of a complex system to execute code
- Uncertainty or a range of possible results that occur in the system being modeled or simulated

Computational scientists tend to worry about item one, would like to push item two to their systems’ vendors and leave item three to the domain experts. Do you agree? Can you think of other factors contributing to uncertainty?

Dr. David Shaw talked about their work at D.E. Shaw Research. He mentioned they have about sixty technologists and associated staff at the lab. They collaborate with other researchers, typically people who specialize in experimentation to help validate the computational algorithms. They are looking at the interaction between small organic molecules and proteins within the body. Their efforts are aimed at scientific discovery with “a long time horizon”.

As someone who worked with life science researchers for a time, I found the content of this presentation the most intriguing. Dr. Shaw commented we might find that today's protein folding models may be of dubious value due to the quality of force field models. D.E. Shaw Research is trying to reduce the time to simulate a millisecond reaction and have structured the problem to try to eliminate some of the deficiencies as they see them in today's models. In turn they can reduce the problem to a set of computational ASICs. They have also developed a molecular dynamic programming code that will run on a machine with these specialized ASICs. As he described it, the machine is incredibly fast and also incredibly focused on a single task. In other words, it is not your father’s general purpose, time share system….

They obtained the speedups through “the judicious use of arithmetic specialization” and “carefully choreographed communication”. In their system they only move data when they absolutely must. One wonders whether this approach could actually trickle down to the commercial computing in some capacity. I think they are almost mutually exclusive. This would imply that accelerating computational computing (acceleration) is always specialized and limited to specific markets, making it unfeasible to pursue as a vendor unless you are a small specialty shop. Do you agree?

Dr. Tim Germann, LANL presented work on a simulation they did to provide content for a federal response plan to a pandemic flu infection. The work was interesting and showed that some of the logical approaches (use of the flu vaccine emergency stockpile) would only delay but not mitigate the impact of a pandemic. They were able to use demographic, population and social contact data to show that a variety of actions, taken in concert, would reduce the impact of a pandemic. The simulation also provided the early indicators that would occur some sixty days before the pandemic was evident.

Truly useful stuff but how do you take these techniques and use them to model other problems? What is the uncertainty in the simulation? Dr. Don Lamb, University of Chicago, talked about the concept of null physics as it applies to evaluating supernovas and the same question arose in my mind…is it broadly useable?

I want to know because I’m a business development guy. The implications of broad or narrow applicability do not make the case for vendors to help scientists solve problems. They do have implications for the way we approach this as a market. This leads to the second section…

The business challenges in the HPC market….

I should point out that I do not, nor have I ever worked for Intel. My observations are those of an interested market participant and outsider.

I’ve heard Steve Wheat present a number of times. As do all the Intel folks, their presentations are crisp and “safe” for general public viewing. Steve opened with some general observations about the growth in HPC (greater than 30% of the enterprise server market) and made the appropriate comments about the importance of market to Intel. It was the kind of “head nodding slide” those of us who present routinely use to make sure the audience is on our side. He then launched into an update on work at Intel that was relevant to HPC. This was good but rather routine, spending some time discussing the implications of reliability when deploying smaller geometries. I think it safe to say, this is the kind of conditioning that Intel, AMD and any other processor vendor should be doing to explain to the market that this isn’t easy. The implication for this audience was that “HPC needs to help solve these problems” and it will benefit the entire industry….eventually. He also made a suggestion that the industry think about the implications of multi-core processors on I/O proposed that I/O be treated as a Grand Challenge problem.

He then spent time talking about the economic challenge of serving the HPC market. My interpretation (not Steve’s words) would characterize the HPC procurement cycle as one that barely allows vendors to recoup R&D costs. Steve pointed out that wins for large deployments typical have terms that penalize failure far more than rewarding success. This appears to be a business problem that any sane vendor should avoid. Why pursue a high profile, high risk opportunity with normal return on investment as the best case? While the PR is good, I can think of other ways to garner good press without putting a company at risk. It feels like the HPC market’s answer to subprime mortgages. Do you agree?

Everyone believes that HPC technology eventually has a “trickle down” benefit to the entire market. However, the payoff is muted because margin degrades with volume and over time. I’m also unsure that the original developers ever see the lion’s share of the margin. Mosaic and Netscape come to mind. Can you think of others either making or disputing this point? Do you agree?

Steve closed with some very thought provoking business slides for an HPC conference. His points could be summarized with the question, “given the needs of the HPC market and the associated economics, what are the dynamics that allow HPC vendors to make active investments to solve these problems”. He makes a case that there needs to be an investment and model that allows vendors to recoup R & D costs. I think it is an interesting topic and worth further conversation. Please post your views and questions.

2 comments:

Unknown said...

A piece of business... The URL for the conference is http://www.hpcc-usa.org/

Second... Even as Wheat claims that working on specific problems isn't good business, the man who revolutionized investing is pouring money into a specific Grand Challenge problem. I've met David Shaw a number of times. I know that altruism is a motivator, but I also believe he intends the final product to be economically self sustaining.

Maybe we're just reaching the logical end of the x86 commodity cycle, which requires new technology for the next leap.

Strypdbass said...

Thanks for including the url.

I don't think it is fair conclude from Steve's comments that "working on specific problems isn't good business...." Steve opened with a number of slides demonstrating the importance of the HPC market.

It is fair to say the risk/return element of large deals (probably not just HPC) are skewed to benefit the buyer. Since the audience includes a number of government funded agencies, it is perfectly reasonable to propose a shared approach to R&D. When the day is over, it is up to the involved parties to decide whether they want to stay and dance or leave. Besides, an "economically self sustaining" business has different implications for a privately held organization than for a multi-billion dollar publicly traded company.

To the comment that we may have reached the end of the curve with the x86 architecture...I think the Blue Gene guys would agree. However, just as a wide spectrum of problems exists, the tools available to address them remains large. I don't see this as the end of the curve for x86. Heck, there are even Alpha systems in use in some research organizations