Friday, April 4, 2008

Belfast: Day 2

Eric still can't get online...

Many Core Reconfigurable Supercomputing Conference update
ECIT, Belfast, Northern Ireland
April 1-3, 2008

In short, the conference consensus is that accelerators are going to be an intergral part of the future computing paradigm. This isn't surprising, given the nature of the conference, but rather than being speculative statements, there was increasing demonstration of community acceptance of heterogenous computing as the next wave of innovation and performance.

Several presentations were made by vendors, SGI, Mitrion and Clearspeed, demonstrating cases where they have had success in proving out performance in real applicaitons. Mitrion with BLAST, Clearspeed with everything from quantum chemistry to financial modeling (but all floating point intensive) and SGI partnering with both of these partners. SGI presentation provided several interesting perspectives.
  • It took a big case (proving 70 FPGAs working together) to begin to draw out interest by many companies in the technology.
  • Now people are approaching SGI on what one can do for them with FPGAs and accelerators. This isn't surprising either, because in this demonstration, SGI stretched the size of the box that was constraining interest in FPGAs.
  • SGI has developed several examples using Quick Assist, but unfortunately, the details of the implementation and interface were not available.
  • It was important to note that Quick Assist focuses on single node acceleration, which is potentially limiting.

Mitrion presented on their language and BLAST example. A primary take home point is that the parallel programming mindset needs to be developed earlier for scientists and programmers alike. Mitrion C helps enforce this mindset. Of course, Mitrion C also emphasized their portability across parallel processor types.

Clearspeed was very interesting because of the speedup and density of performance they are able to achieve. Admittedly SIMD in nature and focused on floating point, the accelerator has a valuable niche, but isn't universal. It seems that Clearspeed is the CM5 coming around
with updated technology. A notable point from Clearspeed was a call for common standards for acceleration, something akin to OpenMP but not OpenMP. One notable point about Clearspeed was the availaiblity of codes that had the Clearspeed implementations.

Several other presentations were given from Alan George regarding CHREC, Craig Steffan from NCSA, Olaf Stoorasli from ORNL. Alan talked primarily about CHREC's effort to move the thought process up to strategy for computing the solution, instead of low-level optimization on a specific processor. A good direction because it provides a more common ground for domain scientists to interact with the application performance experts.

Olaf talked about his work with parallelizing Smith-Waterman on many, many FPGAs and with enough FPGAs, achieved 1000x speedup over a single CPU. This is another example of big cases providing visibility and showing the limit for FPGA computing has a ways to go before finding a limit.

Craig Steffan provided a good overview of NCSA mission which is to bring new computational capabilities to the scientists. Provided good input on necessary steps to have a successful deployment of new computing technologies including
  • Making it easy to keep heterogenous components (object files and bitstreams) together
  • Make decisions at run time on how the application problem will be solved
  • Make documentation available and consistent
  • Access to latest versions (even pre-release) is useful when trying to work around compiler bugs present in early releases

Mike Giles from Oxford presented his experiences in financial modeling using GPGPUs. Showed good success. Commented that standards for GPGPUs will be many years off, but that OpenFPGA is a good sign from the RC community that standards are emerging. Mike also identified that having examples, tools and libraries, student projects and more conferences will be important to getting started in new technologies. For those experienced with parallel programming, it's a 2-4 week learning curve to use CUDA.

Greg Petersen (UT) talked about cyber chemistry virtual center between UT and UIUC. Question for chemists is how to use these machines. Whole research front is on this aspect in order to get to petascale systems. Talked about kernel for QMC applications including general interpolation framework. Looked at efforts using Monte Carlo stochastic methods with random numbers and a Markov process. Significant work on using numerical analysis underlying the chemistry results.

Overall there are many applications using heterogenous acceleration, many in life sciences ranging from MD, to drug docking and Monte Carlo techniques, and nearly all referencing image processing and financial applications that performed well with accelerators. There was overlap in the life sciences space, with nearly every accelerator type demonstrating acceleration for at least one applicaition in this space.

Another significant time block was for the OpenFPGA forum. A show of hands indicated that only about 20% of the audience was aware of OpenFPGA, so I spent 30 minutes on an OpenFPGA overview before moving to the discussion of the general API. Part of the presentation included getting an interest level in assuring open interoperability for accelerators. There were no responses in the negative, many in the affirmative and some undecided.
The GenAPI discussion went pretty well. In short, there were no showstoppers indicating a wrong direction, but more discussion on technical detals of argument specification, what is included and what is not specified. There was a strong interest in having more direction for new areas such as inter-fpga communication, inter-node accelerator communication, etc, although all agreed it was too early to standardize because even the basics had yet to become standard.

There were some comments from those with a lot of history in HPC that the GenAPI looked similar to the model use by Floating Point Systems. There was general consensus that a first standard that is simple is best, allowing common use and then looking to emerging patterns as the basis for future standards. It appeared the application community would accept the standard if it were available.

Summarizing, the conference provided a good overview of work in moving computational science applications to new accelerator technologies that is becoming the new mainstream way to get higher performance for computing. The tools have matured enough that applications are being more broadly developed, and beginning to be deployed.

No comments: