Can you tell us about your background before joining Inria?
After a baccalaureat in Mathematics and Technology and technical preparatory classes, I joined the Ecole Polytechnique in 1989.
I had already been programming for a long time, but I discovered the scientific side of computer science, especially with several teachers from Inria. This is what prompted me to choose to continue my research with a master and a thesis in computer science. I then did a post-doc at MIT in the artificial intelligence laboratory, before joining Inria as a researcher in 1998.
What is your field of research at Inria?
I am interested in shape analysis, and in the variability of human organs in particular, a field called computational anatomy . The mathematical problems that these morphological statistics raise are particularly interesting because one cannot add or subtract shapes. That is why we must reinvent statistical methods to work in these non-linear spaces. The potential applications in medicine are numerous because shape statistics enables the encoding of a priori knowledge on normal or abnormal anatomy
What is "G-Statistics" your project selected by the ERC?
G-statistics aims at exploring the consequences of the non-linearity of data spaces on the statistical estimation through geometry. We already know how to estimate the location (mean, median) and the concentration (covariance) of a random variable in a Riemannian manifold, or to perform simple statistical tests. There are also results for some classes of less smooth spaces, for instance length spaces of non-positive curvature. One of the objectives of geometric statistics is to unify these methods and to extend them to other non-Riemannian geometric structures. We want to include spaces with singularities and changes of dimension, in particular affine connection, quotients or stratified spaces. These geometric structures appear in practical life sciences applications, as for example diffeomorphisms (invertible transformations of space) acting on images used in the registration of medical images, phylogenetic trees or shape spaces.
One of the key points I want to focus on is the impact of curvature, singularities and stratifications on the quality of the statistical estimation. This is especially important in the non-asymptotic regime because the number of data is always finite in practice. For example, curvature influences the concentration of an estimate and its gradient can induce a bias. When the data are sufficiently concentrated with respect to the curvature, these changes with respect to Euclidean statistics are not necessarily very important, but when one approaches a singularity, the curvature can become infinite and its impact becomes drastic.
A second aspect concerns data dimension reduction. It is often assumed that high dimensional data actually live on a small dimensional manifold (the manifold hypothesis). However, this assumption is often wrong because the optimal dimension depends on the scale at which the data are approximated and stratifications may appear. I think it is more interesting to construct a sequence of nested subspaces of increasing dimension which progressively approaches the data better and better, and to choose a posteriori the dimension, if necessary. The natural geometric notion that encodes this structure is that of flag manifolds for linear subspaces. I recently showed that Principal Component Analysis (PCA), which is ubiquitous in applied statistics, could be reformulated as an optimization on this flag manifold. The principle can also be extended to manifolds with more complex non-linear subspaces.
Finally, a third objective is to demonstrate the efficiency of these methods on selected applications in the life sciences field. Studying the variability of anatomical shapes using medical images is of course an application of choice for this, but other areas will also be considered.
Why did you choose these topics?
With medical imaging, I have worked since my thesis at the intersection of applications in medicine, computer science and several fields in mathematics, including geometry and statistics. For more than 15 years, I have been developing at Inria within the Project-team Epidaure, then Asclepios and now Epione, some medical image registration and morphometric methods that have allowed me to perceive the limits of current methods. For example, to go further in modeling complex shapes, it is necessary to consider changes of topology. Such a change corresponds to a singularity in the shape space with a stratification. But the behaviour of statistical estimation is very poorly known under such conditions. For example, colleagues have recently discovered that the mean is attracted towards the singularity under certain conditions (sticky mean), whereas we have shown with the recent theses of Nina Miolane and Loic Devillier that it can be repulsive under other conditions. It is therefore necessary to better understand the interaction of geometry with statistical estimation in order to discover approximate invariances (empirical laws) in life science data that are highly variable and very noisy. This is what led me to focus on the more fundamental aspects.
This prestigious scholarship is above all an extraordinary recognition by the scientific community for the field of geometric statistics as a whole and of the quality of research at Inria.
What does this grant mean to you?
The selection rate of ERC grants is such that many excellent projects are not selected, despite a peer-review selection system that seems particularly fair to me. This prestigious scholarship is therefore above all an extraordinary recognition by the scientific community. Beyond my work, I think this is a recognition for the field of geometric statistics as a whole and of the quality of research at Inria.
More practically, the grant also represents an extraordinary freedom in my research. Most current sources of funding for research require the justification of upstream theoretical research with short-term applications. Thanks to this grant, I have the possibility to devote myself entirely to science on fundamental theoretical subjects without having to constantly justify them. I think it is important to produce knowledge independently of its use if we want to induce conceptual or technological breakthroughs. Of course, I will illustrate my theoretical developments with applications that will highlight the interest of the methodology. But it is the crosspoint of this scientific knowledge with societal needs that might provide a posteriori the trigger for innovation. Not having to worry about it a priori represents a real freedom for research.
How do you plan to use this funding?
The ERC grant will allow me to recruit PhD students and young researchers to work on the above-mentioned subjects. I also plan to organize seminars to invite researchers in this field and workshops to share progresses during the project.
Are there other research tracks you would like to explore in the future?
Yes, of course. A better understanding of the interaction between geometry and statistics could help explaining the unreasonable effectiveness of current machine learning methods, and a contrario to understand their limitations. I am also interested in quantum information because it is based on deep geometric methods. Many other fields present applications at the crossroads of statistics and geometry.
But I have already a busy research agenda for the next 5 years with the G-statistics project!
Five key dates in Xavier Pennec's carrier
- 1996: Thesis in Computer Science from the Ecole Polytechnique (France)
- 1997: Post-doctoral fellow at MIT (USA)
- 1998: Joined Inria as Junior Research Scientist
- 2007: Research Director
- 2017: ERC Advanced Grant
X.Pennec is also teaching at ENS Cachan, Ecole Centrale, and at Université Côte d'Azur