Sites Inria

Version française

High-performance computing

Philippe Fontaine - 6/06/2014

Yves Robert receives the IEEE TCSC Award for his work on high-performance computing

Yves Robert receiving the IEEE TCSC Award © Inria

Supercomputers allow the smooth running of programmes intended to solve major scientific challenges, whether they involve the discovery of new proteins or climate modelling. They are composed of thousands of processors working in parallel and are governed by algorithms which constantly change to adapt to their increasingly complex architectures. The algorithm researcher Yves Robert is one of the global specialists in this discipline. He is also the first European to receive the IEEE TCSC Award, which rewards the work of a researcher in the field of high-performance computing.

What are your main research topics?

My work mainly involves the development of algorithms for high-performance computing (HPC) platforms. The architecture of the most powerful supercomputers is composed of thousands of processors, each with 8, 16 or even 64 cores. All these processors must work in parallel in order to squeeze the most out of the computing power. One of my research topics consists of creating algorithms for carrying out scientific calculations in parallel, more particularly linear algebra calculations, which by their very nature are highly sequential. This is a major challenge, as the resolution of linear systems currently represents almost 80% of the computing time of scientific applications.

You are also working on developing resilience techniques. What does that mean?

The more processors there are in a supercomputer, the greater the risk of one of them becoming faulty. If a processor stops working, the resolution of a programme launched several hours earlier is compromised. To avoid this, we are developing algorithms intended to limit the effects of faults and failures. One example is by setting up checkpoints to be taken when the processors are being used the least, in order to limit the slowdown of the program. The new supercomputers present us with an additional challenge. Their memories are subject to so-called "silent" errors, caused primarily by cosmic rays. These are difficult to detect and cause computing errors which corrupt the end result. The problem lies in developing algorithms capable of detecting precisely when the error occurred, so as to select a valid checkpoint from which to restart the computing.

When did you begin your work on algorithms for HPC?

In 1982. But HPC did not yet exist in their current form when I first took an interest in algorithms for resolving linear systems. At the time, computers worked according to the principle of shared memory. Most researchers did not believe in distributed-memory parallel processing, in other words machines in which each processor has its own memory and can communicate with all the others. We were lucky, because such supercomputers have finally become the norm. Since then, we have been constantly monitoring the development of technologies in order to propose the most suitable algorithms for the new architectures. My work on resilience is more recent, as I have been working on it for 3 or 4 years now. In this field, I work on developing algorithms designed to meet the needs of future Exascale supercomputers, which could be here before the end of the decade.

You are the first European to receive the IEEE TCSC Award. What does this reward mean to you?

It is a great pleasure for me, as it is gratifying to be rewarded by my peers for my work and my service to the scientific community.

Keywords: IEEE High-Performance Computing Yves Robert ROMA Supercomputer