Sites Inria

Version française

Big data


Social sciences are increasingly moving towards a quantitative approach

Big Data © Inria / Picture H. Raguet

A Big Data quantitative approach in the social sciences has made it possible to show the social stratification of society from the analysis of the social network and bank data. This work has been carried out by Yannick Leo in the Dante team, and the results have been published in the Journal of The Royal Society Interface. The study was based on the significant statistical capacity provided by a Big Data approach. Interview with Márton Karsai, senior lecturer at the French higher education institution ENS de Lyon, holder of an Inria chair and contributor to this international study.

How was this research organised?

Márton Karsai: This research, which began in 2014, follows on from earlier research work. It is the result of collaborative work between Dante, the joint Inria project team headed by Eric Fleury, that brings together a certain number of French researchers (ENS de Lyon and Inria), but also researchers from the University of Buenos Aires as well as the company GranData Labs, in charge of managing the data. Inria's contribution to this project materialised through the development, design and publication of the scientific project.

What have been its main findings?

M.K: The unequal distribution of wealth together with “social homophilia” (where similar people/people with similar interests tend to stay together) leads to a stratification of society. We are preferentially connected with people whose socio-economic status is similar to our own. Until then, there were no statistical studies on this scale likely to show the results of the field work carried out by research teams in the social sciences. Our work consisted in empirically verifying this hypothesis by analysing a set of data simultaneously combining the social network and the economic capacities of millions of individuals at a country level. We are showing that wealth, but also debt, is distributed unequally, that people are linked together within a strongly stratified social structure with a strong socio-economic endogenous correlation and the existence of "rich clubs" that are highly interconnected. We also highlight the fact that people from a same class live closer to each other and that people’s income increases along with their daily commute time.

How is Big Datapart of the social science approach?

M.K: The approach used in this study does not go against traditional research in the social sciences. On the contrary: it supplements the results found in previous research on smaller samples. Thanks to this significant statistical capacity thatBig Databrings, we are in a position to provide proofs and to reveal hypotheses and observations on larger populations. In general, the social sciences are increasingly moving towards a qualitative approach thanks to a greater use of information technology.

Are you working on other projects that involveBig Dataand the social sciences?

M.K: We are currently working on an ANR (French National Research Agency) project (SoSweet), which studies the French “twittosphere” in order to establish correlations between social networks and language. In order to carry out this study, we are using two distinct methodologies: the first consists in collecting 25% of all tweets sent - a total of 150 million tweets - over a two-year period from 2.5 million people. The second part of this study is based on a questionnaire sent to a sample of the twittosphere and which focuses on questions about the socio-economic group and level of education of those surveyed. A certain number of results will be published this year.

The study, which brought together researchers from the Dante team, the University of Buenos Aires and from Grandata Labs, was based on two distinct data files. The authors first accessed millions of telecommunications data from over 111 million anonymous users from a Latin American country between January 2014 and September 2015: dates, duration of the communications, anonymised IDs of the people connected and localisation of the antenna were also analysed. The content of the conversations was not revealed, however, in order to respect privacy. To estimate the individual economic indicators, the researchers also used the bank transactions of more than 6 million people, over a period of eight months between November 2014 and June 2015. The study analysed the telephone interactions of 992,538 people connected by over 1.9 million links over several months. This analysis of data on this population, coming from several files, will enable the social stratification to be revealed. A certain number of precautions were, however, put in place: the files used received the go-ahead from the Mexican national banking commission, and their public circulation was not allowed.