ALMANACH

The ALMAnaCH project-team (Automatic Language Modelling and Analysis & Computational Humanities) is a pluridisciplinary team in artificial intelligence (AI) focusing on the fields of Natural Language Processing (NLP) and Digital Humanities (DH), at the crossroads between theoretical computer science, machine learning and linguistics. The team’s research covers a wide range of topics including but not limited to neural language models, machine translation, dialogue modelling, language resource development (monolingual, parallel, annotated corpora, lexicons, etc.), interactive AI, evaluation strategies, information extraction, optical character recognition and handwritten text recognition. The team handles data from varied domains, including user-generated content, biomedical data, patents, as well as historical documents. This also extends beyond text to multimodal processing involving speech and images. A transversal challenge across the team’s research is language variation in all its diversity (in terms of genre, style, register, and dialectal and diachronic variation), both as a challenge to current systems and as an object of study.

Centre(s) inria

Inria Paris Centre