Equipe-projet

KRAKOS

Conception de couches systèmes performantes, robustes, sécurisées, flexibles, et moins énergivores
Conception de couches systèmes performantes, robustes, sécurisées, flexibles, et moins énergivores

Data centers are an essential pillar of IT infrastructures. They host the majority of applications used daily by businesses and individuals and the associated data. These increasingly diverse applications must satisfy ever more stringent efficiency requirements (responsiveness, data volumes, energy consumption, etc.). To meet the latter, data centers are designed with complex, multi-level architectures, characterized in particular by the following aspects1 :

  1. Large scale2 (number of physical and virtual servers, internal and external request volumes),

  2. Density and resource sharing (number of applications co-hosted on each physical server),

  3. Heterogeneity3 (both at the server and the data center scale),

  4. A multitude of hardware accelerators (NVM, GPU, TPU, PIM, FPGA, etc.),

  5. Extremely advanced micro-architectures (AMP, NUMA, DDIO, SGX, etc.).

The system layers (hypervisor, operating system, centralized or distributed runtime) play a critical role in controlling both hardware resources and software activities: they directly impact the data center's security, stability, and efficiency, and, therefore, the applications it hosts.

Numerous studies have highlighted the growing mismatch between today's system layers and data centers' characteristics. Current systems are challenging to maintain, evolve, observe/supervise, optimize, and make reliable and secure. Especially each objective is in tension with the others. Generally speaking, these difficulties lead to under-utilizing the potential of hardware resources. These inefficiencies are amplified by the significant and continuous reduction in time scales, both for the latencies of specific hardware resources [Barroso2017] and the execution time of application tasks [Jonas2019, Lee2019] ("microsecond-scale" computing).

The KrakOS project aims at revisiting the fundamental principles that have governed the construction of system layers until now to consider the modernity of data centers and anticipate future developments. KrakOS targets five main objectives and the trade-offs between them:

  1. performance, characterized by application metrics such as execution time, throughput, and latency, as well as statistical indicators of the variability of these metrics;

  2. Fault tolerance and high availability;

  3. velocity of development, testing, and deployment (to enable rapid consideration of new requirements);

  4. expressiveness and flexibility of programming interfaces (APIs) to simplify application programmers' work;

  5. energy efficiency. KrakOS aims at achieving the above objectives while maintaining (at the very least) or improving the energy efficiency of the considered systems.

Like any Systems team, KrakOS is keen to invent new abstractions, concepts, policies, mechanisms, and techniques. Prototyping and empirical evaluation are the preferred methods for validating our proposed contributions. Given the complexity of the systems studied, theoretical proofs are rarely performed in this field. KrakOS has a singularity in its scientific approach:

  1. Revisit and question the relevance of solutions established in Systems (Process and Thread abstractions, for example).

  2. Revisit and question the relevance of solutions that have not broken through (microkernels, for example).

  3. Leverage existing and broadly available hardware functionalities, but also allow ourselves to propose slight hardware modifications when necessary. This last point is recent in the system community. Indeed, traditionally, system researchers have been content to consider hardware as it exists. Recent experience (the Enzian4 project at ETH Zürich, for example) has shown that freeing ourselves from this constraint in the systems innovation process frees up energies and may open the door to spectacular results.

KrakOS results will be promoted at major conferences in the field and organizing seminars. We will produce software and patches (since the systems studied by KrakOS are open-source systems). The primary beneficiaries of KrakOS contributions are data center operators and system developers.

 

4https://enzian.systems/

Centre(s) inria

Centre Inria de l’Université Grenoble Alpes

En partenariat avec

Université de Grenoble Alpes,Institut polytechnique de Grenoble,CNRS

Contacts

Responsable de l'équipe

Annie Simon

Assistant(e) de l'équipe

Dans l'actualité