Pharmaceutical research goes digital

Changed on 19/09/2022

Developing new drugs is a long, costly and risky process. But a CIFRE thesis, which sparked the creation of an industrial partnership between pharmaceutical group Servier and Inria, has resulted in the development of three tools: a knowledge graph, algorithms, and mathematical models, which could make this work faster, cheaper and safer.

CIFRE theses: an academic and industrial partnership

“This CIFRE* thesis is actually the realisation of an exploratory systems biology project that we started at Inria in 2001,” laughs François Fages, director of the Inria Lifeware project team. “This proves that the project is bearing fruit!” In 2017, Jeremy Grignard picked up the baton on this topic, during a research internship with François Fages’ team as part of his IT and Big Data Engineering studies at the École Centrale d’Électronique. He then went on to complete a CIFRE thesis, combining two partners, academic and industrial. While Inria Saclay Centre was on board to fill the first role, Jeremy Grignard had to find the second and prospected various companies, including pharmaceutical industrials, pitching them the idea for his thesis: machine learning for models using experiments and temporal data.

Servier welcomed him, as within the idea of modelling lies a sizeable challenge for the pharmaceutical industry: to better understand biochemical processes and signalling pathways involving a dysregulated therapeutic target in an illness to better identify drug candidates that could modulate it. And therefore save time and money. Over 58% of drug candidates fail in phase III (trials of large cohorts of patients with the illness to be treated). This represents the loss of years and billions of euros of investment.

Mathematical modelling to understand biological processes

“A large part of what we needed was investigation, which lent itself well to the thesis,” explains Thierry Dorval, director of the Data Science and Data Management Department at Servier. “And there was no shortage of experimental data for the research!” The question was, how to give this data meaning and best exploit it.

Jeremy Grignard started with a concrete case: molecules developed by Servier have an effect at the biochemical level on a protein involved in a particular process (microtubule tyrosination), but this activity is not found at the cellular level. And the researchers did not understand why. The young engineer therefore developed an expert mathematical model using laboratory data, which provided the explanation: the biological process is based on two reactions, and activating only one is therefore not enough!

Schéma de tyronisation — Influence diagram of the mathematical model of the microtubule tyrosination cycle

The model also revealed another protein, which, when inhibited, made it possible to achieve the desired result... opening up a new avenue for researchers. “The idea is now to develop this type of model for each therapeutic project,” says Thierry Dorval. “This involves studying machine learning for models... and it will be the subject of a future second thesis.”

A reoriented thesis, normalised data

While this theme was at the heart of Jeremy Grignard’s thesis from the beginning, the topic has evolved slightly. “We realised that temporal data, which was meant to be the foundation for Jeremy’s research, was not the kind of data that we had the most of,” continues the Servier specialist. “There was a lot more imaging data.” Jeremy Grignard’s thesis was therefore renamed “Computational Methods to Improve Early Drug Discovery” and the researcher focused on using imaging data. In particular, he worked on phenotypic screening experiments.

This involved staining cells, then using image analysis software to observe modifications in their appearance (phenotype) in response to a genetic disruption or chemical molecule. “The problem resides in the large number of variables measured, which we need to be able to normalise in the hope of then connecting a phenotypic modification with the activity of a molecule or genetic disruption,” emphasises Jeremy Grignard. And so, he developed two algorithms to perform this normalisation.

Criblage Phénotypique — Reduced phenotypic space showing the heterogeneity of responses of positive control compounds identified by the selection algorithm in a cellular model

Their potential application is huge, as Servier is part of the JUMP-CP international consortium, which brings together ten major pharmaceutical companies under the aegis of MIT and Harvard’s Broad Institute. The aim of this organisation is to undertake an enormous phenotypic screening experiment by studying the effects of 120,000 chemical molecules and CRISPR/Cas9 genetic disruptions. Using the normalisation algorithms developed by Jeremy Grignard, and creating others, will make it possible to exploit the big data generated.

Linking and relinking data to better exploit it

But the question remained: how? The answer: through the knowledge graph. This mathematical tool makes it possible to integrate very heterogeneous data in very large volumes, as well as the relations that connect them, in a software capable of processing it all. The software developed by Jeremy Grignard is called Pegasus: today, it contains approximately 46 million nodes, i.e. data, from internal Servier research or the public domain, as well as 331 million relations between this data.

While such numbers make the tool seem abstract, its applications are certainly concrete. For example, Pegasus has already been used to select antisense oligonucleotides (ASOs), small genetic sequences capable of altering the expression of a target protein in a rare illness: epileptic encephalopathy. The future objective is to enable Servier researchers to quickly identify potential ASOs for a given therapeutic target.

Pegasus also makes it possible to identify molecules developed by Servier that are chemically similar to molecules studied in scientific publications for their effect on a therapeutic target. “Instead of screening a large number of molecules to see if they act on the target, which on average leads to a 1% success rate, we could base ourselves on similarity connections found in Pegasus to screen the most interesting molecules directly. This would significantly increase the success rate and quality of molecules identified,” states Jeremy Grignard.

Graphe de Pegasus — Identification by Pegasus of chemical and biological perturbators modulating therapeutic targets participating to the autophagosome assembly

Three complementary, scalable tools

The three tools are perfectly complementary: Pegasus is based on data from mathematical modelling and phenotypic screening experiments that the algorithms can normalise. We have come full circle... or almost. Because the databases are huge, and Pegasus, as well as the related industrial applications, are likely to evolve.

This is also why Jeremy Grignard was hired immediately after the three years of his thesis, in March 2022, as a data scientist at the Servier research institute. ‘CIFRE theses make it possible to answer very concrete questions posed by an industrial company, by developing theoretical approaches that we would not have thought of ourselves,” says François Fages. “Furthermore, the tools developed can open up new avenues for us: we have several research projects for which Pegasus could be very useful.” The collaboration sparked by the CIFRE thesis is certainly not over...

* Industrial agreements for training through research

Find out more

La Méthode scientifique avec Jeremy Grignard, France Culture, 12/1/2022 (in French).
Servier & Pegasus - Graphe de connaissances pour supporter la recherche de nouveaux médicaments, Neo4j, 8/4/2022 (in French).

In the news

See all the news