Intro to Longleaf and Pine: Carolina’s Newest High-End Clusters
Though there is no consensus, two principles capture the gist of high-end computing:
- Define a computational problem
- In computing to solve that problem, minimize the amount of time spent moving bits of information around.
The problem itself also matters.
Sometimes a problem needs lots of cores and needs them to communicate with each other as they collectively crunch on a problem: e.g., to model gravitational forces of a black hole’s accretion disk or the forces of a molecule with a few hundred thousand atoms takes a couple thousand, as does trying to predict where a hurricane is going to go or ice sheets might melt. Such problems have been the traditional diet for high-performance compute clusters for decades. Make the cores fast and homogeneous; make the cores talk to one another about whatever they are doing as fast as possible, too.
Not all computational problems are like that. Some don’t require a big collection of cores from the get-go; or they can be broken into smaller independent problems so fewer cores need to talk to one another. The complexity of computational problems varies dramatically.
Then comes the kicker: much of the time, our computational problems aren’t merely computational. Indeed, their point is exactly to process data that characterize phenomena and systems of phenomena around us, whether behavioral, social, medical, physical, organic, environmental, etc.
The challenge here isn’t so much about the sheer speed of compute cores nor how quickly cores can talk to one another.
The challenge here isn’t so much about the sheer speed of compute cores nor how quickly cores can talk to one another. Rather, the challenge is getting the relevant data—the bits of information—to the cores. If the memory isn’t big enough, then there’s a lot of moving bits back and forth between storage and memory. Or maybe the memory is big enough, but the storage system just cannot send data fast enough, or enough of it. Either way, the compute cores have to wait. The computational problem doesn’t require more cores. Making the cores talk to one another better doesn’t help much either.
In other words, the overriding performance problems aren’t computational. They are that the cores are spending time shipping bits around, and waiting for the bits to get from one place to another—to a place where simple computational tasks can process those bits…
Longleaf explicitly embraces the idea that different workloads exercise different components of high-end computing resources. Broadly speaking, the traditional approach tries to maximize the homogeneity of compute node types, even if it does allow for some isolated capabilities to handle exceptions. The Longleaf strategy is explicitly to embrace heterogeneity where different types of compute nodes are available for different workloads. The principal goal is to analyze the job submissions, the actual resource utilization of jobs, establish classes of stable workload types, and dispatch those jobs to different resources based on their attributes.
Longleaf becomes more than a computational and information-processing resource—it also becomes an observation platform.
Longleaf becomes more than a computational and information-processing resource—it also becomes an observation platform. The strategy is to analyze, evaluate and optimize the resources for UNC-Chapel Hill researchers based upon the full breadth of actual workload and demand, rather than optimize for a select segment or idealized dream of that workload traditional to high-performance computing.
To oversimplify it, Longleaf is designed to assume that the information-processing demands are mixed and heterogeneous, to present compute resources that suit that mixed workload, and to deliver jobs to compute resources optimized for each type. So the mixed and heterogeneous workload is the norm, not the exception. To that, our goal is to add analytics methods to recommend job submission attributes based upon actual activity, and perhaps even predict and adapt job submission to optimize ensembles of workload—a new approach to provisioning and configuration management opens a future possibility for dynamically presenting additional compute resources as new capabilities emerge (whether in the cloud or otherwise).
Also looking forward, more customary High-Performance-Computing (HPC) users and groups need not fear neglect. A companion piece to this strategy comes now that we turn to replacing KillDevil in the coming fiscal year. With Longleaf and Pine, we have an optimized platform for mixed workloads, especially those that are typical of data-intensive processing tasks. KillDevil’s replacement—yet to be named—will be designed, tuned and optimized, more specifically for those kinds of parallel workloads that have been the sine qua non of HPC for years. Whereas Longleaf and Pine leapfrog our peers with respect to one suite of capabilities for data-intensive research, our next aim is to perform a similar feat with respect to the bread-and-butter of HPC.