Sites Inria

Version française



Grid'5000: large-scale computing experimentation

In Lille since 2005 and hosted by the Inria Lille – Nord Europe research centre since 1 March 2017, the Grid'5000 test bed offers scientists the opportunity to test their software concepts in real-world conditions, which include quite significant computing power and storage capacity. While some calculations would take years on a single machine, the Grid'5000 test bed reduces the time to several days and even several hours with grid computing.

Promoting and developing experimental research on large-scale grid computing systems, such as high performance computing (HPC), cloud computing and big data, is the objective of Grid'5000. “The researchers thus experiment with the software tools that they develop to meet the growing needs of applications in terms of HPC, data volume, ease of implementation and use via the cloud, ” explains Nouredine Melab, chief scientist for Grid'5000 at the Lille site and researcher on the Dolphin team* at Inria Lille - Nord Europe research center.

Supported by Inria, CNRS, various regional councils and universities, the test bed provides its users with a highly controllable environment for reproducible experiments on more than 10,000 computing cores. It thus offers efficient treatment of very large problems in various fields such as artificial intelligence.

Here’s a concrete example: ten years ago, while working on his dissertation, a doctoral student co-advised by Melab reduced the time for resolving a scheduling problem from 22 years on a single machine to only 25 days by using the IT resources of seven sites. “Since then, we've gone to nine days with more efficient graphics cards. And in the weeks ahead, it will only take a few hours! ” exclaims the scientist.

Composed of resources distributed over eight sites (including one in Luxembourg, with the seven others in France), the test bed serves to share resources. “It also encourages synergies between IT research teams and the creation of multi-site collaboration projects ,” notes the researcher.

CPER Data funding: essential accelerator

To continue the project, the test bed will receive over one million euros in funding as part of the Data State-Region Contract through Inria. In the words of the test bed manager: “The goal is to develop research around data science that is related to the regional fabric via, among other things, joint Inria-business laboratories, ” as part of Inria's priority mission of technology transfer. Another objective of the funding: encourage high-quality research that is recognised internationally. In this framework, there are three top areas: Internet of things, knowledge and data intelligence and its optimisation, and HPC. These areas are based on four factors: attractiveness, demonstrators (to validate a product), technology transfer to SMEs, and the research organisations to which Grid'5000 belongs. This is how during the initial funding phase, the test bed became part of Inria Lille two months ago with multi-core computing servers and PDUs (in order to measure how much energy Grid'5000 consumes). “Later, we plan to integrate graphics accelerators, coprocessors and a storage cluster, ” explains the researcher.

A state-of-the-art environment made available to researchers and engineers from laboratories, for example. The test bed has 600 users each year. “We are trying to attract teams that work in grid systems, ” he explains.

Numerous students trained

Training also plays an important role. Since 2005, more than 500 students working on master's degrees in scientific computing at the University of Lille – sciences and technologies have been trained to use the cluster and parallel programming as part of their practical work on the test bed. Several young engineering graduates have also been recruited on Grid'5000. At the Lille site, a young engineer has been in charge since late 2015 for a two-year period. Another young graduate will replace him when his contract expires. There are also plans to hire an engineer for system and network administration as well as for software development to support the Lille teams. “The engineers that we train for two years become true experts in their field and easily find positions in the private or public sectors, ” notes Melab.

What are the other challenges that Grid'5000 will be taking up? “The primary challenge is scientific in order to deal with the significant changes of scale for applications. Numerous problems need to be addressed, including the change of scale and managing the different types of resources and energy consumption. It will also be important to put the research developed around Grid’5000 in the service of technology transfer and society. This will include defining an economic model that governs business access to Grid’5000, ” explains Melab. “We're currently working on a link with FIT (editor's note: the Future Internet of Things, a test bed for connected object technologies). According to Frédéric Desprez, Scientific Director of Grid’5000, the stated objective is to build a major instrument, SILECS, to meet all the scientific challenges from the Internet of Things, data centres and the networks that connect them. On a realistic scale, this versatile test bed will anticipate future infrastructure to test models, algorithms and languages, ” confirms Melab.

*Dolphin is a joint research team with the University of Lille − sciences and technologies, within joint research unit UMR 9189 CNRS-Centrale Lille-University of Lille − sciences and technologies, CRIStAL.

Keywords: Training Large-scale software development Supercomputer Grid computing State-Region Contract (CPER) for Data