Sites Inria

Version française

Machine learning


With GuessWhat?!, when humans play, the computer learns the language

Locating an element in an image by asking a series of questions. That is the purpose of GuessWhat?!, an interactive game created by researchers in the Sequel project team, in collaboration with a Canadian team. More than just a simple game, it is a real technical challenge: teaching a computer to dialogue naturally starting from an image. The first - promising - results have earned the team a publication at the very prestigious CVPR, the biggest international conference in the field of computer vision.

GuessWhat?! an interactive game

"In machine learning, working solely on images or text is quite a classic approach. It is when you start to combine the two that things become more complicated.  And yet, image and text are intrinsically linked ", Florian Strub, PhD student in the Sequel project team of Inria Lille - Nord Europe center (associated with the CNRS, the University of Lille - science and technology and University of Lille - humanities and social sciences*), points out. For the last year the team, in collaboration with the University of Montreal, has been working on a machine learning research project that involves images and dialogue. Funded as part of the IGLU international project (CHIST-ERA programme), the aim is to show that language is learned by interacting with the outside world. In order to do this, the researchers initially envisaged a practical case: the design of a kitchen robot. "We had to work on the interaction with the robot , Florian Strub, PhD student in Sequel team, explains. For example, it needed to understand instructions such as: "Take the spoon in the third drawer on the left." The computer therefore had to learn how to count and to find its way around its environment. " That is when the researchers had the idea of developing this game-based learning: GuessWhat?!.

The principle: two users play together, the first one selects an object in an image, the second must find the object by asking a series of questions: is the object red? Is it to the right of the image? Is it a car? And so on... For the researchers, the aim is to collect a maximum number of games played so that the computer in turn learns to ask questions. "The concepts of left and right, counting or even colours come naturally to us. And yet, numerous computer models fail this task. With GuessWhat?!, we are building an environment where the computer has no other choice but to use these concepts in order to succeed. What is even more difficult is that it must combine the notions learned in order to develop a coherent sequence of questions and find the hidden object. "

Example of a game where the object to find is a traffic lights : "Is the object green ?" asked player 1 who does not see the image. "No" answered payer 2... - © GuessWhat ?!

Mixing machine learning and image

Automatic learning is precisely Sequel's area of expertise. The researchers in this project team at the Inria centre in Lille are developing algorithms enabling the solving of sequential problems (for example a sequence of questions) with a so-called "reinforcement" method, i.e. with a reward at the end. With GuessWhat?!, the reward is clear: finding the object hidden in the image.  Nevertheless, the processing of the image remains a problem in itself. That is why, from the start of the project, the Sequel researchers have worked in close collaboration with the MILA laboratory (Montreal Institute for Learning Algorithms)."It is the biggest research laboratory in the field of deep learning applied to images , Harm de Vries, a University of Montreal student taking part in the project, explains. Our expertise lies in large-scale computer vision systems. The Sequel team, for its part, focuses on dialogue systems. These two skills are essential in order for the GuessWhat?! project to progress ".

The project takes place in three phases. First of all, the researchers collected information from the 150,000 games played online by humans. Then, using this data, they trained a computer to ask questions by imitating a human being. The project is currently in its third phase: the computer itself plays an infinite number of games. It asks questions as it goes along, and learns from its mistakes. "In the beginning, the computer's questions make no sense. It must gradually learn to ask questions that are grammatically correct, and then which have a meaning. It must therefore train itself by carrying out thousands of attempts. For this, we created a second artificial intelligence that answers these questions by a yes or a no. The two computers interact, just like in a game of chess. With one difference! They do not play against each other, they work together... " If,in fine, a player thinks s/he is playing against a human and not a computer, then the researchers will have succeeded.

In our models, the computer learns to develop a strategy, and that is a first in this type of scenario

The computer's results have already been surprising. On this type of task, a human generally manages to find the object almost 9 times out of 10. With the most basic algorithms, a computer has a 35-40% success rate. With the latest models designed as part of the GuessWhat?! project, the computer reaches a 55% success rate and, according to the researchers, this is just the beginning. "When we analyse the questions asked, we notice that the computer understands the notion of left and right, and the majority of relations between the objects. On the other hand, it still struggles to ask questions concerning colours. However, the real success is that it asks a series of questions according to a logical order. In our models, the computer learns to develop a strategy, and that is a first in this type of scenario ".

This great success has already led to some prestigious spinoffs for the team.  Numerous web companies are closely following this research project and the team that created the game has just been awarded two publications of articles at the CVPR and the IJCAI, the most important international conferences in computer vision and artificial intelligence respectively. "This is wonderful recognition. It proves to us that we are on the right track and that this problem is worth studying ", Florian Strub concludes.

*at UMR 9189, CNRS-Centrale Lille-University of Lille - science and technology, CRIStAL.

Keywords: Game Artificial intelligence Machine learning Deep learning