Artificial intelligence: from fantasy to reality
Benoît Sagot, who was recently appointed to the prestigious “IT and Digital Science” chair at the Collège de France for the 2023-2024 academic year, has his sights set on helping people to understand artificial intelligence (AI) systems, which are used among other things to automatically generate text from written instructions. Sagot is head of ALMAnaCH, a project team at the Inria Paris Centre which specialises in natural language processing and digital humanities. He is also the holder of a chair at PRAIRIE, an interdisciplinary institute for research in artificial intelligence.
Benoît Sagot feels that the need to explain this technology to people is now more pressing than ever, at a time when there is “so much noise, much of it toxic” around generative AI agents, as well as “a lot of scaremongering”. As for the public release of ChatGPT, a solution developed by the US organisation OpenAI, this “doesn't really constitute a scientific or technological revolution, but it has given ordinary people an opportunity to play with it, and a glimpse of the major changes it could bring about in various different fields. ChatGPT shows the significant progress made in NLP over the past 20 years or so, including for mass-market applications such as spell checking and machine translation.”
The more languages, the harder it gets for researchers
Research in NLP continues to move forward, as illustrated by taking a look back at Benoît Sagot's career to date: “When I started out back in 2002 with the now defunct project team ATOLL, I did a lot of work on formalised lexicons and grammars, as well as syntax analysis, the analysis of the grammatical structure of sentences”, explains the researcher, who gravitated towards this field seeking to combine his two loves: languages and computing. “I continued my research in NLP with the Alpage project team, the forerunner to ALMAnaCH, while expanding my research to computational linguistics, which involves studying linguistics from a quantitative and computational perspective.”
Work was carried out on a number of different languages. “It was important to analyse a diversity of languages and how they function in order to understand why some things may be applicable in certain languages but not others”, says Benoît Sagot. In addition to his work on English and French, the director of research also worked to varying extents on a range of other languages. Around a decade or so back he co-supervised a PhD on “the segmentation of Mandarin”, a language which is difficult to process using computer tools: “There are no spaces between words, meaning you need to find another way of identifying them for the purposes of analysis and processing.” He also co-founded the startup Opensquare, where he developed systems for analysing surveys carried out among employees of major international companies whose staff speak dozens of different languages.
Machine language learning - a booming sector
In tackling these challenges, the researchers within ALMAnaCH are able to count on increasing processing capacities, drawing on machine learning technology while contributing to its development. “Major progress has been made in natural language processing (a sub-domain of artificial intelligence) in recent years thanks to the generalisation of neural networks”, says Benoît Sagot. The purpose of these networks is to teach computers how to analyse and process data in a way that is inspired - albeit remotely - by the workings of the human brain. Neural networks are among the methods used for both supervised learning (using annotated examples) and unsupervised learning (using raw data), thanks chiefly to deep learning, which employs the use of large neural networks.
Educating the public and continuing to innovate
A renowned expert in the field, Benoît Sagot is delighted to now have the chance to present these breakthroughs at the Collège de France. “It is a real honour to have been given this opportunity. This is a social issue with significant implications. My goal is to give as many people as possible the keys to understanding it.”
The chair will run from 30 November 2023 to 9 February 2024, with a one-hour class each week. Catch-up videos will also be made available for each class on the Collège de France website. Each class will be followed by an hour-long lecture by a guest speaker.
The first class (on 30 November 2023 at 6pm), entitled “Teaching Languages to Machines”, will introduce natural language processing in its historical context while providing an overview of where the discipline is currently at. The programme for subsequent classes includes: a look at textual data and how it can be represented; followed by introductions to symbolic and probabilistic approaches, language models, contemporary approaches to neural networks, machine translation systems, the challenges raised by chatbots and current research in multimodality (combining text and speech or text and images).
Looking further ahead: making models more frugal
This chair will also provide the wider public with an opportunity to learn about those research topics judged by ALMAnaCH to be a priority. “One of the biggest challenges facing us over the months and years to come is frugality”, says Benoît Sagot. “Language models and chat models are very expensive. Ideally we wouldn't need as much processing resources or training data to produce them, particularly for languages where there is not much textual data available.”
Other challenges include robustness, which is linked to the capacity of applications to function with texts that are further removed from the levels of more common languages, and “alignment”, a term which refers to the capacity of generative AI systems to respect specific principles and values. Ambitious targets which provide Benoît Sagot and his team with plenty of motivation.
The aim of my classes at the Collège de France will be to introduce the wider public to the most important research currently being carried out in natural language processing. I believe that it's important to shine a spotlight on a subject that has got a lot of publicity over the past year thanks to the release of ChatGPT.
Head of the ALMAnaCH project team, and visiting professor at the Collège de France
Benoît Sagot’s brief bio
2000: graduated from the École Polytechnique.
2002-2006: doctoral student with the ATOLL project team (Atelier d’outils logiciels pour le langage naturel - Natural Language Software Tools Studio) at Inria Rocquencourt.
2006: PhD on “Automated analysis of French: lexicons, formalisms and parsers”) Paris-Diderot University (Paris 7).
2007-2016: Inria research fellow with the Alpage project team (Analyse linguistique profonde à grande échelle - Deep linguistic analysis on a large-scale), before being made head of this team.
2017 to present: head of the ALMAnaCH project team (Automatic Language Modelling and Analysis & Computational Humanities).
2019 to present: holder of a chair at PRAIRIE (Interdisciplinary Research and Education in AI), an interdisciplinary institute.
Find out more about the annual chair in “IT and Digital Science” (all in French)
- Communiqué de presse du Collège de France « Apprendre les langues aux machines - Leçon inaugurale » (PDF).
- En savoir plus sur la chaire annuelle du Collège de France « Informatique et sciences numériques ».
- Entretien avec Benoît Sagot : "La frontière entre ingénierie et recherche se déplace vite".
Find out more about AI and automatic language processing
- Benoit Sagot et Aaron Hertzmann parlent d'IA, conference at the Inria Paris Centre on 11/23/2023, Inria.
- [AI and its challenges] “An introduction to deep learning, a crucial component of modern AI” (video) lecture given by Benoît Sagot at a conference organised by the Campus de l’Innovation pour les Lycées (part of the Collège de France) and by SciencesPo on 28/9/2023.
- Ethics and chatbots (podcast), Interstices (in French), 4/9/2023.
- New technology: Do accents need to be “erased” by artificial intelligence?, 20 Minutes (with The Conversation, in French), 18/1/2023.
- “Large-scale Language Models & Their Training Corpora” (video in English), lecture by Benoît Sagot at the Czech-French AI workshop organised by the Czech Ministry of Foreign Affairs and the French Embassy in Prague on 12 and 13/9/2022.
- BigScience has big ambitions for language models, CNRS Le Journal, 12/7/2022.
- Limiting divergence in legal rulings thanks to artificial intelligence, Inria, 21/2/2022.