Sites Inria

Version française

KERDATA Research team

Scalable Storage for Clouds and Beyond

Team presentation

The KerData project-team is namely focusing on designing innovative architectures and systems for scalable data storage and processing. We target two types of infrastructures: clouds and post-Petascale high-performance supercomputers, according to the current needs and requirements of data-intensive applications.

Examples of such applications are:

  • Cloud data analytics applications (e.g., based on the MapReduce paradigm) handling massive data distributed at a large scale.
  • Advanced (e.g., concurrency-optimized, versioning-oriented) cloud services both for user-level data storage.
  • Large-scale simulation applications for Exascale supercomputers.

Research themes

Convergence of Extreme-Scale Computing and Big Data Infrastructures

  • High-performance storage for concurrent Big Data applications
  • Big Data analytics on Exascale HPC machines.

Advanced data processing on Clouds

  • Optimizing MapReduce-based data-intensive processing.
  • Stream-oriented, Big Data processing on clouds.
  • Geographically distributed workflows on multi-site clouds.

I/O management, in situ visualization and analysis on HPC systems at extreme scales

  • Scalable I/O and in situ visualization of HPC simulations on post-Petascale platforms using dedicated cores.
  • Mitigating I/O interference in concurrent HPC applications through the investigation of cross-application interference and I/O prediction.
  • Optimized architectures for in situ visualization and advanced processing.

International and industrial relations

  • NCSA/UIUC ANL: active collaboration with JLESC (Urbana-Champaign) on concurrency-optimized I/O for post-Petascale infrastructures
  • BigStorage: a Marie Curie Initial Training Network (H2020).
  • Data@Exascale: Associate Team with the "Politehnica" University of Bucharest, Romania
  • ANR OverFlow: data management for geo-distributed workflows on clouds

Keywords: Data management Cloud Post-Petascale HPC Large-Scale BLOB Distributed File System BlobSeer Map-Reduce Programming Model Fault-Tolerant Middleware