Sites Inria

Version française

ALMANACH Research team

Activity reports

Overall Objectives

The ALMAnaCH team ALMAnaCH was created as an Inria team (“équipe”) on 1st January, 2017. brings together specialists of a pluri-disciplinary research domain at the interface between computer science, linguistics, philology, and statistics, namely that of natural language processing, computational linguistics and digital and computational humanities.

Computational linguistics is an interdisciplinary field dealing with the computational modelling of natural language. Research in this field is driven both by the theoretical goal of understanding human language and by practical applications in Natural Language Processing (hereafter NLP) such as linguistic analysis (syntactic and semantic parsing, for instance), machine translation, information extraction and retrieval, human-computer dialogue. Computational linguistics and NLP, which date back at least to the early 1950s, are among the key sub-fields of Artificial Intelligence.

Digital Humanities (hereafter DH) is an interdisciplinary field that uses computer science as a source of techniques and technologies, in particular NLP, for exploring research questions in social sciences and humanities. Computational Humanities aims at improving the state of the art in both computer sciences (e.g. NLP) and social sciences and humanities, by involving computer science as a research field.

ALMAnaCH is a follow-up to the ALPAGE project-team, which came to an end in December 2016. ALPAGE was created in 2007 in collaboration with Paris-Diderot University and had the status of an UMR-I since 2009. This joint team involving computational linguists from Inria as well as computational linguists from Paris-Diderot University with a strong background in linguistics proved successful. However, the context is changing, with the recent emergence of digital humanities and, more importantly, of computational humanities. This presents both an opportunity and a challenge for Inria computational linguists. It provides them with new types of data on which their tools, resources and algorithms can be used and lead to new results in human sciences. Computational humanities also provide computational linguists with new and challenging research problems, which, if solved, provide new ways of addressing research questions in the humanities.

The scientific positioning of ALMAnaCH therefore extends that of ALPAGE. We remain committed to developing state-of-the-art NLP software and resources that can be used by academics and in the industry, including recent approaches based on deep learning. At the same time we continue our work on language modelling in order to provide a better understanding of languages, an objective that is reinforced and addressed in the broader context of computational humanities, with an emphasis on language evolution and, as a result, on ancient languages.

This new scientific orientation has motivated the creation of a new project-team at the crossroads between different scientific networks, and in particular:

The École Pratique des Hautes Études, with which collaboration has already started on a number of topics related to Digital and Computational Humanities; When the ALMAnaCH team was created in January 2017, two EPHE permanent members were involved: Marc Bui, Directeur d'Études Cumulant, a specialist of computational humanities and of the computational modelling of the concept of proximity, and Daniel Stökl Ben Ezra, Directeur d'Études, a specialist of digital and computational humanities, Hebrew and Aramaic language, literature, palaeography and epigraphy. Since then, discussions and joint research endeavours have been initiated, showing the great potential of such a collaboration. Joint project proposals were submitted, one of which successfully, and we plan to work on future proposals in coming months and years. Yet after extensive discussions within all members involved in the team as well as with Éric Fleury, the head of Inria Paris, and François Jouen, Dean of the Natural Sciences department at EPHE, we came together to the conclusion that ALMAnaCH was not the optimal level for setting up a large-scale collaborative environment between both institutions, as the potential for collaboration between Inria Paris and EPHE goes well beyond NLP and text-based digital humanities. Discussions on a future Framework Agreement between EPHE and Inria Paris have started, in which ALMAnaCH will play a key role. In this context, several EPHE non-permanent members are still hosted at Inria Paris, within ALMAnaCH offices, in order to ease joint collaborations.

The Berlin Brandenburg Academy of Sciences in Berlin which hosts the national lexicographic project in Germany, funded by the German Ministry of Education and Research (BMBF)

CNRS's Institut des Sciences de la Communication (Institute for Communication Sciences), on topics pertaining to Digital Social Sciences; ALMAnaCH hosted Tommaso Venturini, then on a fixed-term Senior Researcher Position, in September and October 2018, in the context of his involvment in one of ALMAnaCH's projects, the SoSweet project on Twitter-based sociolinguistics. He was granted a permanent position as CNRS Chargé de Recherches at the Institut des Sciences de la Communication starting in November 2018, and we intend to further collaborate in the future.

If confirmed, the PRAIRIE Institute (PaRis Artificial Intelligence Research Institute), whose goal will be to act as a catalyst for research in Artificial Intelligence and for exchanges and between academia, industry and higher education in this domain, in which NLP plays a key role.