Sites Inria

Version française

KERDATA Research team

Scalable Storage for Clouds and Beyond

Team presentation

The KerData project-team is namely focusing on designing innovative architectures and systems for scalable data storage and processing. We target two types of infrastructures: clouds and post-Petascale high-performance supercomputers, according to the current needs and requirements of data-intensive applications.

Examples of such applications are:

  • Cloud data analytics applications (e.g., based on the MapReduce paradigm) handling massive data distributed at a large scale.
  • Advanced (e.g., concurrency-optimized, versioning-oriented) cloud services both for user-level data storage.
  • Large-scale simulation applications for Exascale supercomputers.

Research themes

Convergence of Extreme-Scale Computing and Big Data Infrastructures

  • High-performance storage for concurrent Big Data applications
  • Big Data analytics on Exascale HPC machines.

Advanced data processing on Clouds

  • Optimizing MapReduce-based data-intensive processing.
  • Stream-oriented, Big Data processing on clouds.
  • Geographically distributed workflows on multi-site clouds.

I/O management, in situ visualization and analysis on HPC systems at extreme scales

  • Scalable I/O and in situ visualization of HPC simulations on post-Petascale platforms using dedicated cores.
  • Mitigating I/O interference in concurrent HPC applications through the investigation of cross-application interference and I/O prediction.
  • Optimized architectures for in situ visualization and advanced processing.

International and industrial relations

  • MapReduce: an ANR project on MapReduce-based cloud data management with international and industrial partners: Argonne National Lab (USA), the University of Illinois at Urbana-Champaign (UIUC, USA), IBM
  • FP3C: an ANR-JST project on programming post-Petascale infrastrcutures, gathering the major French and Japanese academic actors in this area. Strong collaboration with Tsukuba University, Japan.
  • NCSA/UIUC: active collaboration with the JLPC (Urbana-Champaign) on concurrency-optimized I/O for post-Petascale infrastructures
  • SCALUS: a Marie Curie Initial Training Network (FP7).
  • DataCloud@work: Associate Team with the "Politehnica" University of Bucharest, Romania.

Keywords: Data management Cloud Post-Petascale HPC Large-Scale BLOB Distributed File System BlobSeer Map-Reduce Programming Model Fault-Tolerant Middleware