Even though the major US national laboratories are just now starting to take delivery of the supercomputers they ordered a few years back, due to the long and complex development process for these projects, the US Department of Energy(DOE) has already been focusing on the next round of supercomputers for the next decade. Under the Exascale Computing Project, the DOE expects to develop & order one (and in the end, likely several) exaFLOPS-capable supercomputers, 50 times more powerful than the generation of supercomputers being installed now.
A long-term project expected to take several years altogether, the Department of Energy and its laboratories have already been working on it for nearly two years now, slowly building towards ordering the final computer. To that end, today the project is taking its next step forward with the announcement that the DOE is awarding $258 million in research contracts to six of the US’s leading technology companies.
At a high level, the significance of this project is more than just supplying an exascale system: a major goal of the project is to figure out how to build such a system. Researchers have known for some time that traditional supercomputing paradigms won’t scale very well to exaFLOPS-level performance, as power efficiency, reliability, and interconnect performance would all struggle at those performance levels. As a result, to get the exascale systems the DOE ultimately would like to have – and to get those systems in a timely fashion to ensure US leadership in the field of supercomputing – it has taken a greater role in the research and development of the required technologies under the PathForward program.
To that end, today the department is announcing that it is awarding a total of $258 million in R&D contracts to major US technology firms to help spur them to develop the necessary technologies. These contracts will be going to a veritable who’s who of major US tech firms: AMD, Cray, Hewlett Packard Enterprise, IBM, Intel, and NVIDIA. All told, the participating companies will be working over a three year contract period, with the respective firms kicking in their own money – to the tune of at least 40% of the project cost – to help develop the technologies needed to build an exascale computer for 2021.
Overall, the DOE’s R&D program is intended to spur development in three areas: hardware, software, and application development. Hardware is of course the biggest issue: how do you build processors energy efficient enough to do 1 exaFLOPS of work in under 30 megawatts, especially at a time when Moore’s Law is slowing down? Even then, how do you actually connect those systems together in a meaningful manner?
The answer to that is to pull together the nation’s largest hardware firms – all of whom already have supercomputer experience – and help them to develop the next level of technology. Unsurprisingly then, the plan calls for everyone to play to their strengths: Cray and IBM working on system level challenges, while HPE develops their Memory-Driven Computing architecture that is based around byte-addressable non-volatile memory and new memory fabrics. Meanwhile Intel, AMD, and NVIDIA are all working on processor technology for the project, along with I/O technology in the case of the former two.
The DOE is still years away from awarding a contract for a complete system – and such a contract will inherently hinge on the outcome of the aforementioned R&D efforts – but at a very high level it’s easy to imagine what such a system will look like, based on the companies involved. The new systems already being brought online, such as Summit, make heavy use of GPUs and other wide processors, and at a pure processing level this looks likely to be a major component of exascale systems as well. What is likely to be farther off of the beaten path for these systems are the storage/memory and interconnects; particularly how these can be used to actually make an exaFLOPS worth of processors work together in an efficient manner.
Not significantly discussed in today’s DOE announcement, but still a big part of the project, will be the software to run on these systems. The issue here being much the same as the system interconnects, that is, actually getting applications and libraries that can scale to as many threads as it would take to fill an exascale system. Some of this will be on the application development side, while other parts will come down to building supporting libraries that are up to the task.
Finally, not to be overlooked are the stakes for the Exascale Computing Project itself. For the companies involved, these research contracts are likely to lead to lucrative computer contracts down the line. Meanwhile for the US DOE and other aspects of the US government and industry, it’s a matter of both technology leadership and good old fashioned national pride. China has already usurped the Titan supercomputer, taking the top two spots in the latest Top 500 list, and the country has its own plans to build an exascale computer for 2020 (and meanwhile, the US Committee on Foreign Investment is looking to further restrict Chinese investment in related fields). So for the US there is a need to keep pace with (and ultimately surpass) any competing systems so that the US maintains its leadership in supercomputer technology.