Deploying some of the world’s fastest supercomputers is among ASC’s accomplishments in advanced computing. However, it is not all about speed. Each new system is engineered to bring certain capabilities to bear on the problems of modeling and simulation that will enhance the overall goals of the Science-Based Stockpile Stewardship Program.
The ASC platform acquisition strategy includes two computing platform classes: Commodity Technology (CT) systems and Advanced Technology (AT) systems. The CT systems provide computing power to a large percentage of the design and analysis community by leveraging predominantly commodity hardware and software. The goal of these systems is to minimize software changes and maximize availability to end-users. In contrast, the AT systems are the vanguards of the HPC platform market and incorporate features that, if successful, will become future commodity technologies. These large, first of-a-kind systems will require application software modifications in order to take full advantage of exceptional capabilities offered by new technology.
Prior to 2013, the ASC Program’s computer acquisition plan identified three classes of computing platforms: Capacity, Capability, and Advance Architectures. In 2013 the ASC Computing Strategy revised the acquisition plan to the two AT and CT platform classes.
The first CT systems provide over 7 Petaflops of “capacity” computing capability to Los Alamos, Sandia and Lawrence Livermore national laboratories. These commodity technology systems (CTS) are designed to run a large number of ‘jobs’ simultaneously on a single system. As was the case for the prior capacity systems, all CTS systems at each site run the Tri-lab Operating System Stack (TOSS) http://computation.llnl.gov/projects/toss-speeding-commodity-cluster-computing which is based on Red Hat Linux. Experience has shown that deploying a common environment at all three sites greatly reduces the time and cost to deploy each HPC cluster. Scientists also appreciate the common look and feel when they use different systems.
The first AT system, Trinity, has been designed to provide increased computational capability for the NNSA Nuclear Security Enterprise in support of ever-demanding workloads, e.g., increasing geometric and physics fidelities while maintaining expectations for total time to solution. The capabilities of Trinity are required for supporting the NNSA Stockpile Stewardship program’s certification and assessments to ensure that the nation’s nuclear stockpile is safe, secure, and effective. The first Trinity hardware was delivered to Los Alamos in 2015, consisting of more than 190,000 cores with a peak performance of 11.5 petaflops (quadrillion floating operations per second). This initial delivery immediately ranked sixth on the TOP500 list. The full system, when complete in 2016, will contain more than 760,000 cores with peak performance of 42.2 petaflops.
Beyond Trinity, ASC will be actively involved in the pursuit of Exascale (1000 petaflops) computing systems. Currently planned future AT systems will alternate siting between LLNL and LANL, including: ATS-2 in 2018 (Sierra, LLNL); ATS-3 in 2021 (Crossroads, LANL); ATS-4 circa 2023; and ATS-5 circa 2025.