Sites Inria

Version française

Personal data

Edward Lichtner - 20/12/2013

You are how you browse

Jérémie Mary © Jérémie Mary

Behavioural analysis algorithms are making it possible to profile Internet users with growing precision. Is Web privacy protection soon to be a thing of the past?

For some years now, the databases of Web giants have been accumulating the personal data of Internet users. First there is the rather detailed information we voluntarily provide through Google, Facebook, and Twitter profiles or submit to commercial websites. Then there are the tracks we leave as we surf from site to site. ‘An Internet user linked to a Google account is like an open book,’ explains Jérémie Mary, lecturer at the Université de Lille and researcher within the Sequel project team at Inria Lille–Nord Europe. ‘Google searches are memorized to ensure future hits are more pertinent to the user. Similarly, Gmail messages are screened to identify personal tastes and select more targeted pitches. Facebook and Twitter likewise record and analyse likes and retweets.’

Last year, Cambridge University showed that it was possible to rather accurately deduce a user's sexual orientation, politics, religion, age, race, and likelihood of drug use simply by interpreting Facebook likes

From pitching to snitching

Though the principal aim of such practices is to better target users for advertising purposes, the amount of information gathered exceeds what most of us are willing to provide. ‘Last year, Cambridge University showed that it was possible to rather accurately deduce a user's sexual orientation, politics, religion, age, race, and likelihood of drug use simply by interpreting Facebook likes,’ Mary says. Internet users can no longer afford to ignore that current methods of algorithmic analysis pave the way for potentially abusive practices. Encryption already offers some protection but is obviously of no avail when dealing with the tracks left while browsing, or profiles constructed from likes and other public information. Current discussion is centred on how to gather statistics while maximizing anonymity. ‘Our data belongs to us, and we should be able to restrict its use as we see fit, defining time spans, conditions, and other elements,’ adds Mary.

Towards a customized Web

From March to June 2012, the Sequel project team at Inria Lille–Nord Europe teamed up with Yahoo to give university research laboratories a challenge. They were tasked with analysing the connection logs* of unidentified users to construct profiles and personalize Yahoo News home pages accordingly. Rising click rates proved how effective the technique was.

Meanwhile, Facebook has relied on Yann LeCun, a global expert in artificial intelligence, since December 2013. The world's leading social network hired him to implement deep-learning tools that can predict the activities of its users and present them with personalized information.

* Connection logs record the sequence of Web pages viewed by Internet users.

Keywords: Personal data Privacy Deep learning SEQUEL

Top