Organizing Computing on Parallel Machines
The software Scotch, developed by François Pellegrini at Inria Bordeaux – Southwest, is turning 20 years old in December. This software is used to organize computing efficiently and robustly on parallel computers. Its success is now growing. It's important to note that it's the only one in the world of its kind to run on hierarchical parallel architectures that include tens of thousands of processors.
Interview with François Pellegrini, professor at University Bordeaux 1, member of the Bacchus team at Inria Bordeaux – Southwest and member of the LaBRI.
What is the software Scotch for?
If a calculation task that must be carried out is too large in size, it cannot be performed on a single computer. There is not enough memory and the calculation would take a considerable length of time. This is true, for example, when you want to calculate the structure of a car, the aerodynamics of a plane, or the weather forecast.
In such a case, multiple computers or processors are used simultaneously, i.e. in parallel. The question this raises is how do you distribute the tasks to be performed among those processors so as to be as efficient as possible. As in organizing an office, this involves achieving a balanced distribution, so that each one completes its task at roughly the same time. It also means ensuring that the information exchanges needed to complete a task are as fast as possible, meaning that it isn't necessary to go to the other end of the department every 5 minutes to get the information we need to move forward. The software Scotch makes this distribution possible so that the capabilities of a parallel computer made up of many processors can be optimally used.
Scotch is a "graph partitioner" as all of the computing tasks to be assigned may be seen as a graph, whose peaks represent the various calculations, and whose valleys in between represent the dependencies between calculations. Finding the right task distribution amounts to breaking that graph down into parts that contain approximately the same number of peaks, so that the number of valleys between those parts is as small as possible.
In what major ways has the software changed in 20 years?
Though the idea of running multiple processors in parallel is an old one, it was in the mid-80s that we began building parallel architectures that can be used for industrial purposes. Although the size of the calculations at the time was very small compared to what we are given to process today, it nonetheless required parallel processing due to the weak performance of processors back then. Nonetheless, graphs representing problems were small enough to be processed on just one computer. The partitioning calculation could therefore be performed using sequential algorithms on a single computer. The goal was to find good sequential algorithms to partition those graphs.
10 years later, scientific problems have gotten so big that the graph itself no longer fits in a single computer's memory. Its partition must itself be calculated in parallel. The challenge was therefore to develop good algorithms for parallel partitioning.
Does Scotch face other challenges today?
As the size of problems to be resolved keeps increasing, the number of computer processors is growing accordingly. The number of processors needed to process the largest problems currently being handled is now in the tens of thousands, and we are currently building systems containing millions of processors!
Under these conditions, it is not possible to operate uniformly, i.e. having each processor communicate in the same way with all the others. We have adopted a hierarchical organization and architecture. To continue my office analogy, for example, the agents that carry out the processing will be categorized into divisions, which are themselves divided into departments, then into offices. Communication between offices from different divisions must go through the hierarchy. The goal is therefore to always distribute work equally, but in a way that takes into account the system's architecture so that most of the data exchanges are local. In this context, we talk more about "placing" tasks than "partitioning".
Are you involved in this research?
This is research that we are currently working on. We have a lead over our competitors as the earliest algorithms that we developed in the 1990s were designed for hierarchical parallel machines, which are also called "non-uniform" or "heterogeneous" machines. At the time, it was not known how to make uniform parallel machines with more than 32 processors. Not long after, technological progress allowed uniform-architecture machines of up to 512, and then later 1024 processors. Scotch's heterogeneity features have been very little-used, but they do exist!
This means that Scotch is able to place tasks on heterogeneous systems, although Scotch itself was not designed to work optimally on those architectures. Placement takes up more calculation time than it should, but for architectures of about 80,000 processors, the placement that is carried out might make it possible to reduce calculation time by 10 to 15% compared to a graph distributor for a uniform architecture.
Is Scotch original compared to competitors like MeTiS, which is widely used in the United States?
Scotch is the only partitioner that can place tasks across heterogeneous parallel architectures. For now, this placement is sequential, but we already have a prototype that can perform parallel placement for this sort of architecture. Additionally, we were the first to offer a 64-bit version for processing meshes of very large sizes. Scotch will probably be the first software able to place graphs with several billion peaks across hundreds of thousands of processors while managing the architecture's heterogeneity. Katie Lewis, a researcher at Lawrence Livermore National Laboratory, for example, a user who works on this sort of very large problem, hopes to cut calculation time by 30% this way.
Scotch is free software, but it wasn't always. Why is that?
Scotch has been distributed under the CeCill-C free license since version 4.0, released in 2006. This became essential in order to ensure that the software was properly distributed. Until then, it had only been used by a small number of partners, including CEA, even though it could serve a very large number of people. Since then, this version was downloaded more than 2500 times and was incorporated into other software like the Zoltan module of the software Trilinos developed by the labs of the U.S. Department of Energy, or the MUMPS solver developed jointly by the teams in Lyon, Toulouse and Bordeaux.