Sites Inria

Version française

ZENITH Research team

Scientific Data Management

Team presentation

Data-intensive science such as agronomy,astronomy, biology and environmental science must deal with overwhelming amounts of experimental data produced through empirical observation and simulation. The three main challenges of scientific data management can be summarized by: (1) scale (big data, big applications); (2) complexity (uncertain, multi-scale data with lots of dimensions), (3) heterogeneity (in particular, data semantics heterogeneity).

The overall goal of Zenith is to address these challenges, by proposing innovative solutions with significant advantages in terms of scalability, functionality, ease of use, and performance. To produce generic results, these solutions are in terms of architectures, models and algorithms that can be implemented in terms of components or services in specific computing environments, e.g. grid, cloud.

We design and validate our solutions by working closely with our scientific application partners such as INRA and IRD in France, or the National Research Institute on e-medicine (MACC) in Brazil. To further validate our solutions and extend the scope of our results, we also foster industrial collaborations, even in non scientific applications, provided that they exhibit similar challenges.

Research themes

Our approach is to capitalize on the principles of distributed and parallel data management. In particular, we exploit: high-level languages as the basis for data independence and automatic optimization; data semantics to improve information retrieval and automate data integration; declarative languages (algebra, calculus) to manipulate data and workflows; and highly distributed and parallel environments such as P2P, cluster and cloud.

To reflect our approach, we organize our research program in four complementary themes:

  1. data search, including including machine learning and content-based image retrieval ;
  2. data analytics, including scientific workflows and data mining;
  3. data integration, including data capture and cleaning;
  4. data management, in particular, indexing and privacy.

International and industrial relations

International: UFRJ and LNCC (Brazil), U. Waterloo (Canada), UCSB (USA), NUS (Singapore), UPC and UPM (Spain).

Industry: Beepeers, LeanXcale, Data Publica, Bull/ATOS, SAFRAN, EDF, Orange, Microsoft.

 

Keywords: Data science Big data Scientific data Cluster Cloud Peer to peer Distributed and parallel data management Data integration Privacy Data analytics Machine learning Data search Content-based image retrieval.