The genomic data produced by reading the DNA of cells is enabling crucial advances in medicine, ecology and agronomy. This precious sequencing data is accumulating exponentially in public genomic databases such as ENA (48 petabytes of data by 2023). However, it is impossible to exploit this data on a large scale because there is no efficient method for interrogating them. We can imagine these data treasures as what the internet would be without a search engine: largely under-exploited.
In order to make full use of this treasure trove of information, the OmicFinder project brings together four Inria teams working on the development of new algorithms and data structures, on the use of ontologies to make the best use of the metadata associated with the sequence data, on the distribution of the indexes that will be proposed and on reducing the environmental impact of the use of the search engines that we will produce. The external partners are CEA-GenoScope, Elixir, Institut Pasteur, Inria Challenge OceanIA, CEA-CNRGH and Institut Méditerranéen d'Océanographie. They are participating in the algorithmic developments and providing validations and use cases.