Changed on 14/04/2021
Ioana Manolescu, director of the Cedar research team, common to Inria and the École polytechnique, is determined to make effective use of billions of data to make them understandable, interesting and usable by all, even the least well-informed. The researcher reveals her career path through three questions.
Ioana Manolescu
© Inria / Photo S. Erôme - Signatures

How did you get into the world of research?

I come from a family of scientists, with specialisations in rail telecommunications, hydro-electric power plants, physics or mathematics with IT; but I am only the second Doctor of Science in the family!

Her beginnings: Ioana Manolescu started studying computer engineering at the Politehnica University in Bucharest, Romania. In 1996, she arrived in France as a foreign student at ENS Paris, holding a scholarship co-financed by the Ministry of Foreign Affairs and the George Soros Foundation.

Did you know that? After a post-doctoral stay at Politecnico di Milano (Italy), Ioana joined the Gemo team led by Serge Abiteboul at Inria Futurs, then Saclay, being of the very first class of researchers recruited for the creation of the three "new" centres.

When I arrived in France, at the ENS Paris [graduate school], I discovered colleagues who shared the same enthusiasm and passion for the studies they were involved in as the colleagues I had left behind in the IT and Robotics Department of Bucharest Polytechnic University. That said, the scope of IT fields on offer at ENS was relatively narrow at that time. I discovered database research at INRIA, in spring 1997, during a Master 1 work placement with the RODIN team in Rocquencourt. I loved it: the idea of exploring a single theme full-time, the blend of theory and experimentation, coupled with the atmosphere and kindness of the highly international team. I knew straight away that I wanted to come back!

What are the focal areas of your work at present?

I’ve always worked on several projects at once, I can’t focus my productivity on a single lead. Right now, most of my work is aimed at making data understandable, interesting and usable - and all of that efficiently of course, because the volume of data is growing endlessly and the efficiency of algorithms is therefore crucial. SourcesSay, my key project at the moment, is aimed at making data talk: any type of data, from any model and any source, by interconnecting it via a graph or a network. SourcesSay is funded by ANR [French National Research Agency] and the DGA [French Defence Procurement Agency] and we have worked with journalists from Le Monde newspaper and WeDoData, a dataviz agency/studio. Angelos Anadiotis and Oana Balalau, my colleagues from CEDAR, have provided their invaluable contribution.

Another project which aims to make data talk is a collaborative effort with my colleague Yanlei Diao. The idea is to automatically discover, via large data graphs, the statistical questions which produce interesting or surprising results. This is a new way of exploring data, offering users leads or questions for further analysis.

What are your long-term goals or ambitions?

In the long term, I think it is important to achieve the universal accessibility and usability of data. Humanity has always produced data, and the proportion of digital data has been growing steadily for the last 70 years or so.

The first database systems consisted of specialised and somewhat esoteric software. Data was produced in the right format, stored in the right system and used for the right application, for around ten years. Implementing and then operating this kind of system forms part of the “basic” baggage of a computer science student today. However, data diversifies, multiplies and is heterogeneous; data sharing and reuse processes are “mushrooming”, or atomising and are no longer reserved to experts.

We have to make data use easy, even for non-experts. We also have to facilitate the task of understanding what a data set holds, if it is useful or not for a certain need. Finally, we must make data truly and immediately inter-operable, even if such data does not follow the “best practices” recommended by specialists. This is the case for most of the open data on Internet or data accessible to journalists and citizens.

Making data accessible, intelligible and usable today implies that we rethink the rigid “silo” design of data management, to create flexible, efficient tools. Democracy is based on choices which are fostered by both values and data. Data must be able to talk to everyone!

Director of Research since 2020, Ioana Manolescu headed the LEO team, followed by CEDAR, which became part of LIX, the IT research laboratory at the Ecole Polytechnique, in 2016. Her fields of research focus on the management of large volumes of complex and heterogeneous data, in particular for applications in data journalism and journalistic fact-checking. She also works on hybrid architecture (“polystore”) for the integration of large data volumes and on the analysis of Web Semantic graphs. She has co-signed more than 150 publications for conferences and national or international journals and is the co-author of two books: Web Data Management and RDF Data Management in the Cloud. A member of the PVLDB steering committee, the international reference journal for the field of large data volume management, she has chaired programme committees and/or conferences such as EDBT, ICDE, SSDBM and ICWE.