ClinMine: analyzing patient trajectories in hospital
Length of hospitalization, care protocols, examination results, and transfer of records between doctors...French hospitals generate a significant amount of data. Knowing how to process and analyze them could enable an optimization of the patients' care. This is precisely the aim of the ANR ClinMine project, in which, notably, members of the Modal 1 project team from the Inria Lille - Nord Europe center have taken part.
“In a hospital, the trajectory of a patient produces a significant amount of complex data ”, explains Cristian Preda, professor at Polytech'Lille and ClinMine project initiator for Inria. “Hospitals increasingly believe that it could be of interest to exploit these data in a more in-depth way. ” This is the backdrop to the creation of the ClinMine project, in January 2014, bringing together six partners from various areas: hospitals, research centers, businesses, etc. Backed by the French National Research Agency (ANR), this research program therefore aimed to develop statistical analysis methodologies for the information collected by the hospitals' Medicalization Program Information Systems (PMSIs) in order to draw care trajectory typologies from them. The methods were then applied within the scope of case studies, like, for example, the analysis of resistances to certain bacteria.
The researchers were faced with a problem: the patient trajectory in hospital is, by definition, dynamic, which complicates the processing of the health data. “Today, as far as statistics are concerned, we are rather good at analyzing data that evolves over time, when they are quantitative. This is the case, for example, with a temperature curve. In hospital, the patient goes from one department to another; there is a different diagnosis, a different medical procedure... All of these qualitative parameters evolve over time. And this is something that is not easy to process, and whose computing methods are relatively recent ”, Cristian Preda specifies.
Analysis of letters from the Institut Catholique in Lille
At the Inria center in Lille, Modal is precisely specialized in the processing of complex data, either due to their size or their structure. Within this project team, Cristian Preda specifically analyses temporality issues. Supported by four researchers, including a post-doctoral researcher, he has therefore particularly focused on this question within the framework of ClinMine. In order to do this, they have worked on a case study - the analysis of all of the letters sent to patients between January 2012 and May 2016 by the hospital group of the Institut Catholique in Lille (GHICL).
“We began with a significant amount of simulation work ”, the researcher remarks. “It was a matter of validating our calculations and the volume of data they could process.” These methods then needed to be applied to the real data; a stage which proved to be long and complicated. The team found itself with 400,000 consultation letters and 600,000 hospitalization letters to extract into data. And many of them turned out to be incomplete or false. “There were many inconsistencies. Our technical tools were detecting patterns that we could not interpret. Now, for the analysis of complex data, it is imperative to start off with reliable material. ” The team therefore had to undertake a big 'clean-up' operation. “We organized several meetings with the hospital's IT departments in order to find explanations and solutions to these anomalies. This data preparation stage represented 50% of our work. We had not anticipated this problem to this extent. Even if we know that, in the analysis of real data, the surprise factor is always very significant. ”
The analysis of the letters made it possible to bring out several typologies, including some that revealed a malfunction. For example, letters that took too long to write, excessive validation times or differences in the processing of the letters depending on the department concerned. In fine, these data therefore revealed problems within the hospital, either on an IT level or with regard to human resources. “After our analysis, the hospital will take ownership of the results in order to explain or understand the different patterns identified” , Cristian Preda remarks.
Within the scope of this case study, the team designed an open license software program. “It could, for example, be used by other statisticians to study evolution trajectories of the qualitative parameters in various contexts (public health, economy, etc.) ”.
After 42 months of research, the ANR project will come to an end next December. Cristian Preda draws a positive conclusion from it: “From a human perspective, it is a success to have been able to bring together computer scientists, statisticians and hospital staff within the same research program. From a scientific perspective, we have managed to develop innovative methods which have been recognized in various publications. "
However the team does not intend to stop there. Indeed, several of the project's areas of research could not be concluded. “In particular, we have not had the time to analyze the data from the Lille University Hospital (CHU) within the allocated time .” For this case study, the researchers were hoping to find a method to detect a potential start of cognitive impairment by analyzing the medical history of the patients. The target: to find predictive trigger factors for Alzheimer's disease. “It is a subject that interests a lot of people. And now that we have the data, we think that it would be a good idea to continue the work within ClinMine 2 ”.
Partners of the project
Six partners joined forces on the project coordinated by the CRIStAL laboratory of the Université de Lille: the company Alicante, EA (reception team) 2694 "Public Health: Epidemiology and Quality of Care", the Institut Catholique de Lille hospital group (GHICL), EA 1046 "Alzheimer's Disease and Vascular Pathologies", LIFL (Lille Computer Science laboratory), and the Inria Lille - Nord Europe research center.
1Modal is a joint project team with the CNRS, Université Lille I - Sciences et Technologies and Université de Lille 2 − Droit et Santé. Within the UMR 8524 CNRS-Université de Lille 1 − Sciences et Technologies, the Paul Painlevé laboratory, and the EA 2694 "Public Health: Epidemiology and Quality of Care" of the Université de Lille 2 − Droit et Santé.