A look back at a very successful call for projects
Financed by the French National Research Agency (ANR) within the framework of the Investments for the Future programme, the DATAIA Institute in data sciences, artificial intelligence and society is the convergence institute in France dedicated to the data sciences and their disciplinary and application interfaces.
For this first call for research projects, the DATAIA Institute received 32 proposals. In accordance with the eligibility criterion, these proposals are based on the collaboration of at least two people from two of the 14 founder establishments of the DATAIA Institute (which include Inria) not belonging to the same laboratory or the same host institution.
The specifications for the call for projects also stipulated that each proposal had to show how it contributed to the DATAIA Institute's objectives, in particular to the structuring of the field of data science within the Paris-Saclay campus and the creation of new synergies between the different scientific actors interfacing with users or data producers.
For its part, the evaluation of the projects relied on very specific criteria: scientific excellence, the synergy between the partners, interdisciplinarity and the potential applicative impact linked to the subject covered. It was on this basis, and following the study of the subjects received by the programme committee, that the representatives of twelve projects were selected and auditioned on 9 April.
The Inria Saclay - Île-de-France centre involved in three of the five projects selected
At the end of these auditions, the selection committee chose five research projects proposing study subjects ranging from the prediction of the "prosumption*" of renewable energy to the exploitation of data to assist with job seeking and ethics in the interaction between conversational agents.
One of these prizewinners is Nicolas Anciaux, head of the Petrus team, with his GDP-ERE project, in collaboration with the university of Versailles Saint-Quentin in the Yvelines department. This research project focuses on the new European regulation on the protection of personal data (GDPR) and the personal cloud:
GDP-ERE - GDPR and personal cloud: from Empowerment to REsponsibility
• Project leaders: Nicolas Anciaux, Inria Saclay – Île-de-France; Mélanie Clément-Fontaine, UVSQ; Philippe Pucheral, Inria Saclay – Île-de-France – UVSQ; Guillaume Scerri, Inria Saclay – Île-de-France – UVSQ; Célia Zolynski, UVSQ. • Abstract: With a world drastically changed by artificial intelligence and the exploitation of personal data approaching, the place of individuals and the control of their data have established themselves as central issues in the new European regulation on data protection (GDPR) and the French law for a Digital Republic. The GDP-ERE project has a twofold objective: to analyse the impact of personal cloud architectures on liability issues and compare this analysis with the rules laid down by the GDPR, and to propose legislative and technological evolutions that enable a better capture of the share of liability necessary between the different parties and provide each of them with the appropriate tools to take on these liabilities. Portability establishes a right for individuals to recuperate their personal data, and opens up opportunities for empowerment and development of new uses such as personal big data and big personal data, carried out under the control of the individual. For now, the legal framework is limited to prescribing this right to portability whilst recognising that it comes with new forms of liabilities without, however, specifying the linkage between individuals, platform providers and service providers and without taking into account the variety of personal cloud technical solutions. The aim of the GDP-ERE project is to analyse this dual movement - legal and technical - in order to more accurately establish the liabilities inherent to empowerment, in compliance with the existing legal concepts of processor, outsourcers and third parties, and the exemptions of the GDPR, as well as envisaging the recommendation of platforms offering a progressive level of accountability to individuals, adapted to the technology.
Gaël Varoquaux, researcher with the Parietal team , has also been chosen by the DATAIA Institute for his project entitled MissingBigData in collaboration with the CNRS. As its name indicates, this research project focuses on the challenges of missing data in big data:
MissingBigData: missing data in the big data era
• Project leaders: Julie Josse, CMAP - CNRS and Gaël Varoquaux, Inria Saclay – Île-de-France. • Abstract: Big data, which is often observational and compound rather than experimental and homogeneous, poses challenges with regard to missing data. We propose to use more powerful models that can benefit from large data samples, in particular auto-encoders, to impute missing values. In order to avoid skewing conclusions, we will study multiple imputation and conditions on dependency in the data. Our project aims to reduce risk factors with regard to healthcare with the prediction of better results and the identification of the risk factors of undesirable results. We are seeking an operational solution, from the methodology to the implementation, which integrates the diversity and volume of the data. We are also moving away from classic studies by considering several types of missing data. This will be a first, but it seems feasible in view of the results of Mohan and Pearl (2018).
Michèle Sebag and Philippe Caillou, researchers with the TAU team, have been selected for their project Vadore, in collaboration with the CNRS and ENSAE, for adding value to data for job seeking:
Vadore: “Adding Value to Data for Job Seeking”
• Project leaders: Bruno Crepon, ENSAE - Michele Sebag, CNRS - Marco Cuturi, ENSAE - Christophe Gaillac, ENSAE - Philippe Caillou, LRI (Laboratory for Computer Science) • Abstract: The context of the project is that of unemployment in France. Unemployment is a phenomenon with multiple causes, depending in particular on the factors limiting labour supply and demand. This project focuses on frictional unemployment, related to informational imperfections due to the costs of collecting, processing and disseminating information, as well as the information asymmetry between employers and job-seekers and the cognitive limitations of the individuals. These imperfections are one of the reasons why certain jobs remain vacant even when high demand for employment is observed in the same employment sectors. The central idea of the project is to mobilise all of the available information in order to improve the match between job-seekers and vacancies. The project relies on the mobilisation of the considerable body of information on job-seekers and companies, some of which (textual data in particular) is still unexploited. This information will be used to develop two functionalities of different technical natures and economic inspirations, assess them and closely compare them. For these prizewinners, the DATAIA Institute's support focuses on the funding of a thesis and a two-year fixed-term contract, or two theses and, potentially, operational costs.
Another Inria Saclay - Île-de-France centre project is on the point of being selected
As the level of quality of the projects received was very high, the Institute decided to draw up a complementary list of three projects, and is currently studying the financial support arrangements it could provide them with. One of these three projects is HistorIA , an interdisciplinary project for the development of large historical databases, led by Jean-Daniel Fekete, head of the Aviz team, in collaboration with Télécom ParisTech.
HistorIA: Large historical databases Data mining, exploration and explicability
• Project leaders: Jean-Daniel Fekete, Inria Saclay – Île-de-France and Christophe Prieur, Télécom ParisTech • Abstract: Since the development of big data methods and their arrival in the social sciences, several very ambitious initiatives have emerged with the aim of changing the way historical research is carried out. However, the deployment of these new approaches encounters numerous concerns on the part of the historians who, faced with the difficulty of interdisciplinary dialogue, are often sceptical about the very purpose of a collaboration in which they are worried - sometimes rightly - of being stripped of a component they no longer feel they control once it is transformed in order to be integrated into databases. Both these transformation procedures and the analysis procedures raise profound methodological and even epistemological uncertainties, especially since the tools implemented are often innovative and have therefore not benefited from extensive feedback. In this project - bringing together researchers in history, computational social sciences and information visualisation - we wish to develop large historical databases by applying data mining methods, in particular around the analysis of the relations networks, whilst also implementing an iterative approach of the exploration process, based on the users' appropriation of the procedures and tools used as well as the results of the analyses. For this, the emphasis will be placed on the explicability of the algorithms, the progressive analysis of the data and human-computer interaction
There is no doubt that the quality of the projects and the richness of the subjects proposed more than met the expectations of the Convergence Institute. Watch this space to learn more about the projects "GDP-ERE " and "MissingDataIA", whilst hoping that "HistorIA" also receives support.
* Prosumption: More active and more critical consumption, through increasing information for the consumer.