What is the genesis of your project?
Thanks to the Covidom telemonitoring application, the AP-HP had a large amount of data they wanted to do something with. The difficulty was that it was extremely confidential data. So two approaches were put in place with Inria as part of the Covidom Stat project. On the one hand, only four Inria engineers had access to the raw data. Even if this data is pseudonymised (there are no names), this can pose major problems of confidentiality. This is why, in parallel, synthetic data were created. Generated from the original data and retaining its statistical properties while strictly adhering to the obligation of confidentiality and anonymity, they were made available to three Inria teams - Modal, Tau and Statify - so that these teams could each identify what they could contribute according to their expertise and the format of the data.
Christophe Biernacki, Modal project team, Covidom Community project coordinator
Jill-Jênn Vie, Scool project team, scientific responsible for the EIT Health project « Covidom Community »
Victor Alfonso Naya
Issam Ali Moindjie
Florence Forbes, Statify project team
Marc Schoenauer et Michèle Sebag, Tau project team
How is it developing today and what are its objectives?
The engineers working on the raw data have developed a program to predict patient arrivals at the hospital, in order to anticipate possible overloads of hospital services. Further developments are underway with the AP-HP epidemiologists.
As for the research teams, they will very soon present to the AP-HP doctors what they are able to do with the synthetic data. This presentation will illustrate the diversity and complementarity of the generic research carried out by the teams but which can be applied to the medical data considered here. In particular, it will contribute at the exploratory level (understanding the interaction between clinical measures, identifying patient typologies) and at the predictive level (estimating the risk of Covid-19 for a patient in the absence of available tests, predicting the evolution of the disease).
One difficulty encountered on this project is reliability: we have data entered directly by patients, and therefore not as reliable as those indicated by doctors - there is a high risk of false positive. One way to make them more reliable is to cross-reference them with other medical data. However, the bridge is difficult to establish because confidentiality issues are quickly encountered. This is why the generation of synthetic data from sensitive data is a field of research in its own right today: how can useful information for doctors be extracted from the traces without compromising user confidentiality?
How do you work with your partners?
We have regular meetings with the doctors, but during the health emergency we had to reconcile different working methods. For Inria, the priority was to have freedom of action for exploratory data mining, to open up "opportunities for discovery", which quickly led to the creation of synthetic databases free of confidentiality constraints. On the AP-HP side, the approach was more protocol-oriented to follow the usual and proven recommendations of the medical world. In the end, these very distinct approaches highlighted their great complementarity and usefulness for medical research, and discussions of very close long-term collaborations are now envisaged between Inria and AP-HP.