"Fact-checking" software for understanding the world in which we live
Ioana Manolescu, Research Director at Inria Saclay – Ile-de-France Research Centre, heads the OAK team, a joint team with Université Paris-Sud. She is currently working on a tool for deciphering content: "An application for rooting out the fibs told by politicians?" as headlined by the paper Ouest France in an article published on Thursday, 3 September 2015.
Here we take a look back at Ioana Manolescu's career and at how this ambitious project came into being.
You are the lead of the OAK research team, what are you working on?
IM: We are working on optimising management of huge amounts of complex data through the design of new and better tools.
Today's society produces more and more digital data whereas 40 years ago, databases were only used for bank data. In the 1960s, bank accounts were the first mass public application of computer data. In particular, computerisation made it possible to create electronic payment cards.
Now however, electronic data is used in very many areas of life. A search could be used to find out how many times a month you write to your parents, or who you spend the most time with on Skype.
These new data do not have the same format as the data we were used to processing over the last 40 years, so we need new tools to manage these new data. Furthermore, there is much more data and they are more complex.
How did this project to design fact-checking software come about?
IM: It goes back quite a long way and for me, it started in 2012, the year that election campaigns were held in France and in the United States. At the time, groups of people started working on what we call "fact checking". Traditionally, this is what journalists do when they need to ensure that they do not publish factual errors or interpretations that could mislead the reader.
In 2012, associations and collectives started doing fact checking based on the statements made by public figures. They were good people, funded by private groups, whose aim was to see political discourse become clearer and more correct. In France, groups of journalists, mostly at Le Monde (Les Décodeurs) and Libération (Intox-Désintox) began checking and clarifying politicians' statements. Contemporary society and issues in the news are very complicated. To form an opinion on the issue of migrants, for example, or on unemployment or the effects of Monsanto pesticide, you first need to be able to understand what they are about! For me, as a database researcher, knowing that information does exist, knowing that experts publish reports and studies, knowing that information can be accessed somewhere, it is rather vexing not to personally have access to such information; the fact that information is available in electronic format on the Net is not enough to be able to access it as easily and as quickly as one might wish.
I realised in 2012 that databases could be invaluable in resolving this problem.
In concrete terms, what does the software do?
IM: There are huge amounts of data that are relevant and accessible, and to which we have a right of access. I mean Open Data. This is not personal data but data created by government departments (for example, statistics on the number of births at different maternity wards, or on farm production of such-and-such a fruit in a certain region, etc.). We imagined searching all these data and analysing them, more or less automatically, to check what is being said in public.
My dream, for instance, would be to broadcast teletext, underneath statements made by politicians on TV, which would retrieve information from a body such as the Insee (France's National Institute for Statistics and Economic Studies) on the subject under discussion. This would enable the audience to immediately have an idea about the truth of what was being said, or to take their own analysis further: if a figure is stated, what has the trend been over the last 10 years? Is the situation similar in other comparable countries? and so on…
In technical terms, how will it work?
IM: We will use text analysis tools, so that what is said can be understood and analysed and that the entities mentioned can be identified; and databases, in which we will have stored the data we have available and statements made over time. Semantic information will be used to interpret what we have extracted from the text: we will thus be able to recognise that an MP of such-and-such a party is an elected MP and how long his or her term of office is, so we can infer their stance on certain social issues, etc.
Combining text analysis, databases and semantics will provide a good analysis chain which can be used to extract the contextual information most relevant to the subject, as rapidly and as usefully as possible. I think that this will make political conversations much clearer! The idea is to set up a software platform, which we want to make very modular. People will only be able to use part of the software to create the application they require, tailored to the subject in which they are interested.
How does the team working on this project function?
IM: The OAK team developed an initial tool of this sort in 2013, which was based on the ideas we had at the outset but which have since developed. Two years ago, the director of the LIMSI laboratory (Université-Paris Sud and CNRS) put me in contact with Xavier Tannier, a researcher at the LIMSI who was working on similar subjects. We began working together out of a need to develop the existing tool and adapt it to the real needs of journalists. We went to meet the team of Les Décodeurs, at Le Monde, led by Samuel Laurent and made up of journalists, computer graphics designers and a programmer. We each had skills that are useful in furthering the project, so we decided we would set it up together, and submit a project proposal to the ANR (the French National Research Agency). Our project was accepted. I am now the coordinator. It includes OAK, the LIMSI (especially X. Tannier), the Université de Rennes 1 (F. Goasdoué) and INSA de Lyon (especially S. Cazalens and P. Lamarre).
So what stage are you at now?
IM: The project will kick off in January 2016 and run for 4 years. It is a deeply collaborative project. The input of Xavier Tannier and our colleagues in Lyon and Rennes to the software developed by the OAK team implies rethinking the software's architecture, in practice meaning we must rewrite fresh code. To do that, Inria has shown its support by funding a post for an engineer who will join the team and start on re-engineering the software. He will have support from people recruited by the ANR project.
You have won a Google Award for this project, what impact has that had?
IM: I had been told that Google was planning to open an award programme specifically for research in this field. With Xavier, we took a small idea from our ANR contract, developed it and submitted it under the call for proposals. Google has since sponsored this second project for a period of one year. We sill start by working on the Google project - the ANR project will then benefit from its results. The Google project is a bit more applied, whereas the ANR project is more ambitious, more scientific, more long-term.
What can we wish you for the next stage in your research?
IM: Great developers! Because it really is an ambitious project. We are really entering unexplored territory. Databases were first used in the finance sector, and then gradually in all trade and industry in general, but so far the Human Sciences have remained out of the circuit. But this may well be the area in which most can be done. Traditionally, there has been a divide between what databases can manage and all other content. Our project aims to close that divide and make databases useful in more and more areas.
I am originally from Romania, I lived under a dictatorship until I was 14. I think it is really important to vote, to have a say in how society is run. But freedom has no meaning if you don't understand the consequences of your choices. I often say, "I am highly qualified and have plenty of good will, but I don't understand my wage slip, with all the charges and deductions." Computer Science can help us to better understand the world in which we live.