Towards reproducible software environments in HPC with Guix
Guix is free software, developed under the auspices of the GNU Project by a growing community of enthusiasts and organizations: currently between 40 and 50 people contribute each month. It is used to reproduce software environments. Recently, the Inria Bordeaux – Sud-Ouest Research Centre, Max Delbrück Center for Molecular Medicine in Berlin and the Utrecht Bioinformatics Center in the Netherlands decided to undertake a joint effort using this software. What do the three institutions have in common? They all use or have users of high performance computing (HPC) software, and in these institutions, and many others, the ability to reproduce experiments is a stringent necessity… Guix appears to be one of the solutions.
Reproducibility is an important topic of scientific discussion. In the computer science, more and more experiments rely on sets of complex software. The ability to reproduce an experiment thus depends on the ability to reproduce the software environment. In this context, the National Science Foundation (NSF) in the United States now encourages experiments carried out by the HPC community to be reproduced, and journals such as Nature are also insisting on the importance of sharing source codes and supporting reproducibility. Some HPC conferences such as SuperComputing have directives on the subject. Finally, Inria's latest strategic plan devotes an entire chapter to the topic. In parallel with Guix, there are two other types of tools: “traditional” package managers and “containers”.
- Package managers give the person who seeks to reproduce an experiment the description of what is required (application, library, compilers, etc.) and the source code of the software used when it are free. Package managers are well known in the GNU/Linux world but are often not flexible and reserved for the machine's system administrators. Some research teams have developed package managers, such as EasyBuild and Spack, that are installed on top of the system's own manager, thus providing users with greater flexibility. These tools cannot however exactly reproduce the software environment since they themselves depend on software that is already found in the system.
- Container solutions, such as Docker and Singularity, provide standalone images that include not only the application used for the experiment, but also all of the software that it requires, including part of the operating system. It's like receiving a brand new computer where everything has been preset and you don't have to do a thing. It achieves total reproducibility, but matters become more complex if you wish to make a small modification in the experiment to test another hypothesis…which happens frequently in the world of research!
Guix thus offers an alternative solution to traditional package managers and containers. It can be used to reproduce a software environment without the need for the system's package manager. The Inria Bordeaux – Sud-Ouest research centre, Max Delbrück Center for Molecular Medicine in Berlin and the Utrecht Bioinformatics Center seek to optimise this software for HPC. This is done by adding packages for HPC software that were developed and used at each of the institutions, but also and above all by adding functionalities that facilitate its use on a computing cluster and implementing reproducible workflows.
Ricardo Wurmus, system administrator of the Scientific Bioinformatics Platform at the Max Delbrück Center for Molecular Medicine, uses Guix: "Before Guix the installation of scientific software was necessarily ad-hoc. Groups would build their own software, statically link it,and hope that it will never have to change (because management of software environments was virtually impossible). Not only can we now manage a single environment per group in a reliable fashion; we use Guix for environments at all levels: group, project, user, workflow… etc. "
At the Inria Bordeaux – Sud-Ouest research centre, Ludovic Courtès, an engineer in the Experimentation and development Section, is responsible for optimising the software for HPC. With support provided by a technology development initiative from Inria, its long-term objectives are that the software meet the HPC requirements of the Centre's research teams and that it is compatible with computer clusters like the one hosted and used at the Bordeaux centre: PlaFRIM.
The project is scheduled to last two years. By that time, the project's initiators hope to have met the software reproducibility needs of their institutions. The wider objective is to convince other HPC decision makers of the advance that this approach represents.
See the Guix-HPC Web site for more information!