What is the origin of your project?
Project holder: Fabien Gandon, responsable de l'équipe-projet Wimmics
Partners: Université Côte d'Azur, CNRS, I3S
Assisting the use of scientific literature on coronaviruses
Scientists from all domains are harnessing their multidisciplinary expertise and resources to fight Covid-19 pandemic. To contribute to this effort, the Wimmics team decided to use the confinement period to launch the Covid-on-the-Web project as sprint to adapt and combine its methods, models and tools (ACTA, Corese, MGExplorer, Morph-xR2RML) to process, analyse and enrich the “Covid-19 Open Research Dataset” (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses.
How is it evolving today and what are its objectives?
Extracting, publishing and visualizing a knowledge graph about the Covid
The goal is to make it easier for biomedical researchers to access, query and make sense of Covid-19 related literature. We designed a pipeline to continuously enrich a knowledge graph about the Covid and software to exploit it, leveraging knowledge representation, text, data and argument mining, data visualization and exploration. The pipeline extract named entities mentioned in articles (DBpedia, Wikidata and other Bioportal vocabularies) as well as argumentative graphs, meant to help clinicians analyse clinical trials and make decisions. On top of this knowledge graph, we developed, adapted and deployed several tools providing visualizations and exploration methods and notebooks for data scientists.
How do you work with your partners?
Addressing motivating scenarios and competency questions from biomedical institutions
Several biomedical institutions have shown interest in using our resources, may they be direct project partners (French Institute of Medical Research - Inserm, French National Cancer Institute - INCa) or indirect (e.g., Antibes Hospital, Nice Hospital). For now, these institutions act as potential users of the resources, and as co-designers. Through active discussions with INCa and INSERM, we are ensuring that our approach is guided by and aligned with the actual needs of the biomedical community. Having a user-oriented approach, we are designing the tools and resources according to motivating scenarios identified through a needs analysis of the biomedical institutions. One of the very first example of query they suggest we work on was “”find all articles that talk about both a type of cancer and a virus of the corona type”. We are constantly eliciting meaningful new queries from the potential users we interview, and these queries serve to specify and test our knowledge graph and services.
The SARS-Cov-2 outbreak is linked to a so-called emerging virus. Since its appearance in December 2019 in China and its emergence on a global scale from January 2020, the effects of this virus are gradually being discovered in parallel with the progression of the epidemic, such as the broad spectrum of affected organs (ENT, lung, nervous system, skin, etc.).
However, the links between SARS-Cov-2 (asymptomatic, severe forms or even possible reinfections) and cancer are not known. Moreover, the role of several viruses in the development of different types of cancer is demonstrated (e.g. HPV, HBV, EBV, etc.) or suspected to a greater or lesser extent (IARC monographs).
Thus, in addition to the fate of patients suffering from cancer and secondarily affected by SARS-Cov-2, the role of the virus in the medium or long term in the predisposition to the appearance of a cancer and its possible involvement in the evolution or appearance of a second cancer cannot be excluded (e.g. pulmonary, ENT, brain, etc.). In addition, and retrospectively, it would be relevant to study the impact of the first two epidemics due to coronaviruses: SARS-CoV1 and MERS-CoV, which appeared respectively in 2002 in South-East Asia and in 2012 in the Middle East on the a posteriori development of cancer and, more broadly, their impact in relation to cancer.
It is in this context that the collaboration between INCa and the Wimmics team was born.
Karima Bourougaa, PhD, Responsable des affaires scientifiques, Division Recherche et Innovation
En effet, l’expertise de l’équipe Wimmics dans le Web sémantique est apparue comme nécessaire et incontournable pour identifier les liens potentiels entre cancer et coronavirus. L’équipe peut en effet traduire en requêtes spécifiques des échanges informels ou des hypothèses de recherche afin de remonter l’ensemble des données pertinentes. Cette collaboration met d’autant plus en exergue la complémentarité et la nécessité de développer des outils de recherche avancés permettant la remontée d’informations de tout type (non limitées aux journaux à comité de lecture) afin d’étudier les liens potentiels entre cancer et infection par un des coronavirus. Ce travail permettra d’autant plus d’anticiper l’impact éventuel sur le développement d’un cancer ainsi que de proposer une programmation adaptée en fonction des questions de recherche qui seront identifiées.
Responsable des affaires scientifiques, Division Recherche et Innovation de l'INCa