- Présentation
- Publications HAL
- Rapports d'activité
Equipe de recherche ZENITH
Gestion de données scientifiques
- Responsable : Patrick Valduriez
- Type : Équipe-projet
- Centre(s) de recherche : Sophia
- Domaine : Perception, cognition, interaction
- Thème : Représentation et traitement des données et des connaissances
- Université des sciences et techniques du Languedoc (Montpellier 2), CNRS, Laboratoire d'informatique, de robotique et de microélectronique de Montpellier (LIRMM) (UMR5506)
Présentation de l'équipe
Modern science such as agronomy, bio-informatics, and environmental science must deal with overwhelming amounts of experimental data. Such data must be processed (cleaned, transformed, analyzed) in all kinds of ways in order to draw new conclusions, prove scientific theories and produce knowledge. However, constant progress in scientific observational instruments and simulation tools creates a huge data overload. For example, climate modeling data are growing so fast that they will lead to collections of hundreds of exabytes expected by 2020. Scientific data is also very complex, in particular because of heterogeneous methods used for producing data, the uncertainty of captured data, the inherently multi-scale nature of many sciences and the growing use of imaging, resulting in data with hundreds of attributes, dimensions or descriptors. Processing and analyzing such massive sets of complex data is therefore a major challenge since solutions must combine new data management techniques with large-scale parallelism in cluster, grid or cloud environments. The three main challenges of scientific data management can be summarized by:- scale (big data, big applications);
- complexity (uncertain, multi-scale data with lots of dimensions),
- heterogeneity (in particular, data semantics heterogeneity).
Axes de recherche
Our approach is to capitalize on the principles of distributed data management. In particular, we plan to exploit: high-level languages as the basis for data independence and automatic optimization; data semantics (taxonomies, folksonomies, ontologies, …) to improve information retrieval and automate data integration; declarative languages (algebra, calculus) to manipulate data and workflows, with user-defined functions; and exploit user (social) profiles and relationships between participants to help recommendation. Furthermore, we will exploit highly distributed environments in particular, P2P for data sharing between participants and parallel processing to scale up in the cloud. To reflect our approach, we organize our research program in three complementary research themes:- Data and Metadata Management. This theme addresses the problems of managing and integrating data and metadata with uncertainty, in particular, n-way schema matching and distributed probabilistic query processing.
- Data and process sharing. This theme addresses the problems of scientific data and processes in highly distributed and parallel environments, in particular, social-based P2P data sharing and scientific workflow management.
- Scalable data analysis. Given the gap between the growth of computing power and that of data production, our ability to analyze these data is inevitably at stake. This theme addresses the scalability problem by investigating new data mining and content-based retrieval techniques that exploit parallelism in the cloud.
Relations industrielles et internationales
International- Equipe Associée Sarava (2009-2011) with UFRJ, Rio de Janeiro, on P2P data management for online communities.
- CNPq-INRIA project DatLuge (Data & Task Management in Large Scale, 2010-2012) with UFRJ and LNCC, Rio de Janeiro, and UFPR, Curitiba on large scale scientific workflows.
- EGIDE Picasso project Scaling GraphDB (2010-2011) with UPC, Barcelona on very large graph database support.
- EGIDE Osmoze project SECC (SErvices for Curricula Comparison, 2011-2012), with Riga Technical University on automatic analysis and mapping of conceptual trees and maps acquired from digital documents.
Mots-clés : Données scientifiques Données incertaines Traitement de données Analyse de données Partage de données Workflows scientifiques Intégration de données Recherche par contenu P2P Cloud.
Equipes de recherche du même thème :
- AXIS - Conception, analyse et amélioration de systèmes d'informations dirigées par les usages
- DAHU - Verification en bases de données
- DREAM - Diagnostic, recommandation d'actions et modélisation
- EXMO - Echanges de connaissance structurée médiatisés par ordinateur
- GRAVITE - Visualisation et exploration interactive de graphes
- MAIA - Machine intelligente et autonome
- MOSTRARE - modèles de structures arborescentes, apprentissage et extraction d'information
- OAK - Optimizations and Architectures for Complex large data
- ORPAILLEUR - Représentation de connaissances, raisonnements
- SMIS - Systèmes d'informations sécurisés et mobiles
- TYREX - Types et raisonnement pour le web
- WAM - Web, adaptation et multimédia
- WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
Contact
Responsable de l'équipe
Patrick Valduriez
(Voir toutes les équipes)
Tél: +33 4 67 14 97 26
Secrétariat
Tél: +33 4 67 41 86 88
En savoir plus
Généalogie
Cette équipe fait suite à
Rechercher une équipe
Par centre de recherche Inria
Inria
Inria.fr
Inria Channel

Voir aussi