The protection of privacy on the Web is a subject that still contains many grey areas and unanswered questions, especially for the general public, which is often poorly or not at all informed about the future of its data.
This is particularly true of cookies and other tracers present on websites, against which the vast majority of Internet users think, mainly because of the cookie banners that are everywhere on the Web thanks to the ePrivacy Directive and the RGPD, that they can be protected by simply blocking third-party cookies.
On the other hand, the Internet advertising industry, represented by a huge ecosystem generating thousands of billions of dollars in revenue distributed among many players, is active on the Web every day to facilitate user profiling and, thus, ad selection.
This observation prompted the launch of the Greasy project, led by Inria researcher Nataliia Bielova from Privatics team, with the aim of assessing the extent to which cookies are 'fat', i.e. what traces remain despite efforts by Internet users to clean up or block them. This project has produced three major results, which are equally important for the scientific community, the general public and the legislative and regulatory authorities.
"Missed by Filter Lists", or how filters miss up to 30% of trackers
The first result, the foundation of this work, was unveiled in 2020. The Greasy researchers used a behavioural approach to demonstrate that current tracking detection tools miss between 25 and 30% of trackers, without looking at the URLs, but by looking at the types of exchanges observed.
"Until our work, both the scientific community and the general public thought that these filtering lists were an effective solution for detecting and blocking trackers," explains Arnaud Legout, research director in Diana research team at the Inria centre at the Université Côte d'Azur.
Moreover, Greasy's research has shown that the filtering lists cannot block these undetected requests, at the risk of removing the functionality of the site consulted. The reason: these cookies have been deposited as first-party, first-intention cookies (and not as third-party cookies), and are therefore impossible to block.
And there is currently no way to prevent this. "We live in a world that has changed because of the Internet. This whole world has been built on a mostly free ecosystem (Facebook, Twitter, Waze, Instagram, Google Maps, email...). But nothing is really free because all the services offered are 90% financed by advertising. Are we losing out? I don't know. What is important is that people understand what they are leaving and what it is costing them," says Arnaud Legout.
But what happens, then, if each Internet user browses privately, or cleans up their browser storage, in order to avoid being tracked?
« My Cookie is a phoenix » : tracking without tracking cookies
This is the second topic that the team behind Greasy has looked into, identifying for the first time a new technique (called cookie respwaning with browser fingerprinting), which allows tracking a user across different Web sites, even if that user uses a private mode or clears his browser storage. Worse yet, this technique will allow tracking to continue even if Web browsers remove the ability to have tracking cookies.
"What we show in this paper is that the techniques we have detected, which today represent a very small percentage in Internet advertising, would make it possible to bypass the depreciation of third-party cookies, and therefore to continue tracking without third-party cookies," explains Arnaud Legout.
To do this, the researchers developed a methodology that allows them to detect the dependence of cookies on the characteristics of the browser and the machine. The results show that 1,150 of the top 30,000 Alexa websites deploy this tracking mechanism, going so far as to track users across multiple websites, even if third-party cookies are deprecated.
This new tracking technique is explained by the fact that Google has announced the upcoming removal (at the end of 2023) of third-party cookies in Chrome. A problem for the trillions of dollars of Internet advertising worldwide, thanks to tracking. "This means that even if third-party cookies are banned, trackers will be able to continue tracking, using first-party cookies and fingerprint techniques," adds Arnaud Legout.
Medical sites and the RGPD: alarming results
Finally, Greasy researchers have more recently focused on showing by analyzing 385 health-related websites that users visit when searching for doctors in Germany, Austria, France, Belgium and Ireland, that the majority do not comply with the GDPR. A major privacy issue, since Alphabet (Google's parent company that controls more than 80% of online advertising) can acquire medical information about Internet users without or even against their consent.
"What we wanted to show here is that when you visit an e-commerce site that does tracking, it's not the same as when you visit the site of a health professional who does tracking. It doesn't give the same information. Knowing that you have bought the latest sneakers does not transmit the same information as knowing that you have made an appointment with an oncologist. We show that today, the level of privacy on medical websites is not higher than on commercial websites," says Arnaud Legout.
As the RGPD only authorizes the processing of health-related data with the explicit consent of the user, health-related websites must indeed ask for consent before any data processing, especially when they integrate third-party trackers. Yet, and according to Greasy's findings, at least one form of tracking is present on 62% of the health websites analyzed, prior to interaction with the consent pop-up, and 15% of the websites include tracking after rejection.
"The sites on which we detected these flaws have been contacted, but we have to admit that it is very difficult for healthcare professionals to address these legal compliance issues, on the one hand because they usually outsource the management of their site, but also because the subject is difficult to understand by an uninformed person", moderates Arnaud Legout.
Since one of the objectives behind Greasy's work is to help with regulation, the results of this research have been shared with the CNIL to improve understanding of the tracking of Internet users' browsing history.