AI EDUCATION: What Is Exascale Computing?

90

Each week we find a new topic for our readers to learn about in our AI Education column. 

Less than a century ago, modern computing was launched in the vacuum-tube-and-punch-card era with machines taking up entire floors—or multiple floors—of buildings that could boast of being the most powerful computational devices ever built.  

Today’s computers are much different from those early machines—most of us carry very sophisticated computers around in our pockets, or wrapped around our wrists—but the most powerful devices still take up huge amounts of space. 

Welcome to another edition of AI Education, where we’re going to discuss exascale computing, the latest big benchmark achieved in supercomputing. Rather than give our usual drawn out introduction, we’ll jump into a definition right away. Exascale computing is supercomputing that takes place at a level that exceeds 1 exaFLOP per second. 

Now, that’s about as clear as mud to a layman, so we’re going to need a couple of more definitions to get this piece going. 

What Is Supercomputing, and What Is an ExaFLOP? 

Supercomputing is the development and operation of supercomputers. Supercomputer is a term most of us have heard hundreds if not thousands of times but have never had properly defined. In writing these pieces, we’ve adopted our own definition of supercomputer that we’re going to set aside for today so we can offer a more formal definition. 

Let’s divide computers into general-purpose computers—the computers on our wrists and in our pockets, as well as our laps and desks, and supercomputers, which are devices specifically designed for higher levels of performance than their general-purpose peers. Supercomputers are used for complex tasks, like predicting the weather, minerals exploration and detailed simulations.  Supercomputers are centralized but integrated systems built with arrays of central processing units (CPUs), graphics processing units (GPUs) and other microprocessors capable of parallel processing. 

FLOP is kind of a funny acronym for floating-point operations per second. While FLOP sounds like something we might do after work wears us out, in technology it means work—it is a measure of computer performance. Today’s general-purpose desktop computers generally perform somewhere in the 100s of gigaFLOPS (1011 FLOPS) to 10s of teraFLOPS (1013 FLOP) range—meaning 100 billion to 10 trillion FLOPS. An exascale computer operates over the 1 exaFLOP (1018 FLOP) range—which means 1 quintillion FLOPS. To give us an idea of scale, an exascale computer would be 1 million times more powerful than a very powerful 10 teraFLOP general-purpose desktop computer. 

What About Distributed Computing 

In the past, we have referred to achievements in distributed computing, where powerful networks of independently located, powered and operated computers work together to deliver supercomputer-like processing capabilities, as supercomputers. While merging the terms is useful when discussing cloud computing and artificial intelligence, in computer science, supercomputers—and thus exascale computing—are distinct from distributed computing. The need to move and store data in distributed computing creates bottlenecks that traditional supercomputers do not face. 

Crossing the 1 ExaFLOP Barrier 

The 1 exaFLOP barrier was first broken in October 2018 at Oak Ridge National Laboratory, between Knoxville and Chattanooga, Tennessee, by the Summit supercomputer as it was processing genomic data, however, this was not considered an example of exascale computing as the achievement was marked using alternative forms of measurement. In 2020, the exaFLOP barrier was broken in distributed computing. 

In 2022, the U.S. officially brought the world into the exascale computing age with Frontier, a computer built at the Oak Ridge Leardership Computing  Facility, where it occupies 7,300 square feet of space. Frontier is largely employed in scientific R&D. The system requires about 21 megawatts of power, equivalent to enough to power 15,000 residential homes, to function. Believe it or not, that power use was revolutionary for its time, as until Frontier was announced, computer scientists had estimated that it might take up to 500 megawatts to power an exascale computer, which would be a major impediment to their construction and operation. Thus, Frontier not only ushered in an era of exaFLOP-plus computers, but also more efficient supercomputing. 

Frontier was the world’s most energy efficient supercomputer from May 2022 until September 2022, but it remained the most powerful supercomputer in the world until November 2024, when it was dethroned by a new exascale system: El Capitan. 

Hosted at the Lawrence Livermore National Laboratory in Livermore, Calif., El Capitan was officially launched in February of this year and remains to this day the most powerful supercomputer in the world, capable of offering a peak of 2.746 exaFLOPS of processing power. El Capitan takes up about 7,500 square feet of space, consuming about 30 megawatts of power. The system is employed with handling various tasks having to do with the U.S. nuclear weapons stockpile. 

What Does Exascale Computing Have to Do with AI? 

Public-facing AI generally runs on distributed computer systems and is offered via the cloud. We access these tools via Amazon, or Microsoft Azure, or ChatGPT’s web interface and the bulk of the computing is being done across multiple data centers that may or may not be adjacent  to each other. However, some uses might call for a dedicated supercomputer, for example, when very large amounts of dedicated resources are needed, as might be the case with some institutional scientific research into particle physics or astronomy, or in complex and crucial commercial research as in drug discovery and molecular science, or in national defense use cases where both security and processing power are priorities. 

Exascale supercomputers offer the ability to perform very complicated calulations, or massive number of smaller calculations simultaneously. Thus, these computers are able to run massive, complex and extremely sophisticated AI models with trillions of parameters. The more powerful the computer, the bigger the data sets it can process and analyze. Exascale computers don’t just allow researchers to construct realistic scientific simulations and models, but can also in turn analyze these models to help surface insights and expedite discoveries. 

Today’s publicly available AI models do not access exascale technology in their operations, nor have they been trained with exascale supercomputers. However, future generations of AI will have the sophistication and power of exascale computing behind them.