Inria & HPC
StarPU: Integrate the performance of heterogeneous architectures thanks to generic development
StarPU is a software component enabling the provision of a unified and generic view of the infrastructures available for computation, whatever their kind: CPUs, GPUs, accelerators.
StarPU proposes an API enabling the automatic execution of code on the available resources by taking into account data dependencies in order to optimise the execution of the computations. As a result, this provides a single programming interface to learn and use, allows for the development of clearer, more generic software that is capable of allowing users to make the most of the possible infrastructures and therefore improve their ROI.
StarPU aims to address the following user issues:
- Developing applications that are independent from the underlying infrastructure reduces their performance and scalability
- Having to take into account, oneself, the interaction between the heterogeneous computation resources potentially available to the end client on a programming level
- Excessive difficulty in programming that results in an underutilisation of available computation resources
With StarPU, users benefit from:
Improved consideration of heterogeneous architectures
- Time savings in the resolution of computations
- Simplified software development
- Capacity of the applications to manage infrastructures that were hitherto underutilised
- Capacity of the developers to concentrate on the core business and not on taking the interaction between heterogeneous resources into account
- Optimisation of the ROI of the infrastructures and reduction in the TCO of its infrastructures and teams of developers
StarPU enables HPC libraries or compiler environments to exploit, in a much simpler way, heterogeneous multicore machines equipped with CPU, GPU or even Cell processors. Instead of managing low-level issues, programmers can concentrate on the algorithmic aspects.
Portability is achieved as a result, thanks to a unified abstraction of the resources of a computation machine. Moreover, StarPU offers a unified off-loadable task abstraction called “codelet”. Subsequently, instead of rewriting all of the code, programmers can encapsulate existing functions in codelets.
In the event that a codelet may be run on heterogeneous architectures, it is possible to specify one function per architecture (e.g. one function for CUDA and one function for CPUs, etc.).
StarPU takes care of the scheduling and execution of these codelets as efficiently as possible across the entirety of the machine available. In order to simplify the programmer's work with regard to the explicit writing of the data transfer between the computation resources, StarPU proposes high-level data management ensuring memory coherency for the entire machine. That way, before the codelet is executed (on an accelerator, for example), all of the necessary data are made available in a transparent way on the computation resource. Given the interface expressivity and portable scheduling policies, StarPU makes it possible to achieve performances by efficiently and simply exploiting all of the computation machine resources at the same time. In the same perspective of taking advantage of the heterogeneous nature of the computation machine, StarPU uses scheduling strategies based on performance models capable of best setting their parameters automatically (auto-tuned).
In this way, StarPU provides algorithms enabling the following to be taken into account:
- The implementation of tasks for GPUs and CPUs
- Management of tasks to be executed in the form of graphs by using either pragmas available via a high-level GCC plugin proposed by StarPU or via a rich API C, provided by StarPU.
StarPU thereby provides the following real-time solutions:
- Management of task dependencies
- Optimised scheduling for heterogeneous resources
- Replication and transfer of optimised data between the main memory and the memories linked to the computation resources
- Optimised communications for clusters
STORM Research Team(STatic Optimizations, Runtime Methods) works in Networks, Systems and Services, Distributed Computing field. and particularly on the theme Distributed and High Performance Computing.