Natural language processing is one of the cornerstones of artificial intelligence. It is currently used in a range of different ways, from answering queries online or automatic translation to simplifying and summarising texts.
Despite the progress made in recent years, due in no small part to the availability of large text corpora, and the development of increasingly powerful learning algorithms, automatic language analysis systems still don't allow machines to actually understand texts and languages to the extent that humans are able to. These systems still struggle to predict changes in meaning depending on the context, to combine statistical and symbolic information and to make the predictions they produce easier to explain and interpret.
These issues were raised in January at a meeting day attended by Inria and the DFKI, during which the researchers went into detail on the scientific questions raised by these issues, the methods that could be employed and the fields in which experiments could be carried out. They also discussed possible collaborations.
Three Inria project teams (Magnet, Sémagramme and Multispeech) and one DFKI team (MLT), all of whom work with and on language-based data, decided to put together a project, which they christened IMPRESS. The aim of this project will be to find models capable of incorporating symbolic knowledge and deep learning in order to represent the meaning of words and wider linguistic expressions. The field of study will be multimodality (text and video), making it possible to evaluate the methods used and to work with different types of knowledge other than lexical knowledge.
We are planning to centre our work around jointly-supervised PhDs, joint seminars and existing synergies (Nancy and Saarbrücken, for example, are already part of the same Master’s programme, Erasmus Mundus). Pascal Denis, Inria researcher and joint project leader
Better results from complex automatic language processing tasks
Each of the teams involved in IMPRESS will be able to bring their own specific perspective and expertise on the subject: Magnet and Multispeech will contribute their expertise in statistical and machine learning for automatic language processing and resource development; Sémagramme will contribute their expertise in symbolic models and resource development; while MLT will contribute their expertise in human-machine dialogue, translation, development and multimodality (language and videos). Three specific targets have been set:
- To determine and develop methods for the injection of knowledge, particularly lexical and semantic knowledge, into the multidimensional digital representations used in deep learning (obtained exclusively through language-based data) in order to improve the results obtained from highest-level tasks, such as anaphora resolution (i.e. to identify which entity a pronoun is referring to).
- To determine and develop methods for the injection of this knowledge into the representations obtained, which are also used in multimodal systems (obtained using both language-based data and videos).
- To develop and distribute free software implementing these methods.
What we are looking for is better results from complex automatic language processing tasks, and to offer linguistic and software resources drawing on this, in addition to making the results obtained through deep learning easier to explain, explains Pascal Denis.
IMPRESS was given the go-ahead to begin research on 29 May and is now awaiting the arrival of the PhD students and engineers they have recruited in order to get their in-depth scientific work up and running.