Photos, videos, important documents... cloud storage has been an integral part of our daily lives for over a decade. Most user data is stored by large service providers, who have the ability to build data centers capable of handling large amounts of information.
As a result, users must trust cloud providers to keep their data private, while having little control over its use. Indeed, the terms of service of the major players may give permission to access and exploit user data to their automated systems, employees or trusted third parties.
An "AirBNB of data storage
This is the problem that startup Hive is tackling to develop a peer-to-peer cloud, an alternative to existing solutions, that provides both computing and data storage via a peer-to-peer network rather than a centralized set.
"Hive proposes to exploit the unused capacity of computers, and encourages users to bring their computing resources to the network, in exchange for similar network capacity or monetary compensation," explains Claudia-Lavinia Ignat, head of the COAST project-team (Inria Nancy - Grand Est) and the Alvearium Challenge for Inria.
In practice, the user will connect to the network, identify himself on the Hive platform, and decide to share part of his resources (100 GB of storage space, for example). The service is free and only pays if the user consumes more than they share.
By exchanging their computing resources, users can then benefit from all the services of a Cloud, while ensuring the confidentiality of their data since it is fragmented, encrypted and dispersed across the peer-to-peer network. Users can control access to their data by sharing the location of fragments and their decryption keys directly and only with trusted users, without having to store them with a central authority. "The risk of a privacy breach is reduced because in the event of an attack on a peer-to-peer network node, only a small portion of the protected data is exposed. There is no longer a single place where an attacker can go to get all the data. This makes it much more complicated for an attacker," says Claudia-Lavinia Ignat.
Another advantage of the system proposed by Hive is that the participating nodes are independently owned and operated, so the costs of administering the system are shared. This should make data storage and sharing more affordable for everyone. It also reduces energy waste by providing compute and storage resources closer to users and avoiding data center energy overhead, such as cooling, which costs about 40 percent of a data center's total energy consumption.
The more users you have, the more you scale, the greater the resiliency.
A Challenge for a Sovereign Cloud able to face the American giants
In order to scale up, Hive and Inria have decided to work hand in hand to see the emergence of a sovereign cloud, through a joint challenge.
Hive currently offers a data storage solution for documents of all types, whether textual or multimedia. These documents can be of a significant size of several tens of megabytes. However, these documents are immutable, i.e. they are read-only and cannot be modified. In this Challenge, Inria and Hive will thus work to extend the current solution to mutable data, i.e. data whose state can be modified after it has been created. "We are also targeting mutable data that can be modified collaboratively by different users," says Claudia-Lavinia Ignat, before adding, "the main challenge is how to ensure data convergence in the presence of concurrent modifications".
In addition to ensuring the high availability of data, i.e. that it is available at all times and that any request concerning it must be answered, but also its consistency, Alvearium wants to guarantee that the data is stored securely. "We want to guarantee the confidentiality, integrity and accessibility of the data, i.e. that it is protected against unauthorized reading and cannot be modified by unauthorized access," says Claudia-Lavinia Ignat.
Major collaborative service providers such as Dropbox, iCloud and GoogleDrive have indeed adopted encryption solutions to store only the encrypted version of shared document data. However, to facilitate the use of their services, service providers also store the encryption keys, which gives them the possibility to decrypt the data and thus be subjected to various attacks. This project thus aims to provide so-called "end-to-end" encryption techniques, so that only authorized peers can decrypt the data.
Four areas of work for four years of the Challenge
To address these issues, the Challenge, named Alvearium, will bring together the skills of four Inria project-teams (COAST, which works on distributed collaborative systems; MYRIADS, which works on the Cloud and the management of resources in the Cloud; WIDE, which works on the theory and tools for large-scale and dynamic distributed systems; and COATI, which works on network optimization algorithms) to solve the problems encountered by Hive.
The Challenge is structured in four axes:
- Viable data placement and repair. Peer-to-peer storage must have a data placement strategy to select the most appropriate storage nodes to place data, respecting certain constraints: compliance with regulatory policies of authorities, and user preference in terms of security and privacy. The objective is also to provide data repair mechanisms to respond to possible failures;
- The management of mutable data, i.e. data that can be modified after its creation, on peer-to-peer storage. Data sharing should be end-to-end encrypted and only authorized peers should be able to decrypt the data. Merging of concurrent changes can be performed once these changes have been received and decrypted by the authorized peers ;
- Investigating new techniques for handling "Sybil attacks" and "Byzantine failures", i.e., malicious nodes, in the context of untrusted distributed storage. The objective here is to offer stronger guarantees in terms of fault tolerance, data integrity and security;
- The development of a data security mechanism, to enable secure data storage. Finally, the objective will be to propose a security mechanism adapted to distributed systems without a central authority that manages users' access rights to shared documents, from end to end, i.e. that only the end user can decrypt, which is not the case today with the major Cloud providers.
The overall aim of these four areas is to offer, by drawing on the skills of Inria's project teams, a sovereign, high-performance Cloud capable of meeting users' storage needs as effectively as existing providers, while respecting the confidentiality and security of their data.
The specific contract between Inria and Hive was signed at the end of December 2022. The research work for this Challenge begins this year with the recruitment of three interns at Inria and Hive at the beginning of February and four PhD students later in the year.