ALMANACH Research team

Automatic Language Modelling and ANAlysis & Computational Humanities

  • Leader : Benoit Sagot
  • Type : team
  • Research center(s) : Paris
  • Field : Perception, Cognition and Interaction
  • Theme : Language, Speech and Audio
  • Inria teams are typically groups of researchers working on the definition of a common project, and objectives, with the goal to arrive at the creation of a project-team. Such project-teams may include other partners (universities or research institutions)

Team presentation

The ALMAnaCH team (Automatic Language Modelling and Analysis & Computational Humanities) focuses on Natural Language Processing (NLP), a key area within Artificial Intelligence (AI) and Digital Humanities (DH), at the crossroads between theoretical computer science, machine learning, and linguistics. The team especially concentrates on the syntactic and semantic parsing of natural languages, including noisy web-based data, using symbolic, statistical, neural and hybrid techniques. One of the most recent challenges is the integration of contextual information, both linguistic and non-linguistic, for instance within chatbot systems. The team is also involved in Digital and Computational Humanities, especially the study and modelling of linguistic variation, i.e. for studying ancient documents, modelling the evolution of languages, study the languages of the web or participate in the development of text simplification tools for improving their accesibility to everyone.

Research themes

  • Automatic Context-augmented Linguistic Analysis
    • Context-augmented processing of natural language at all levels: morphology, syntax, semantics
    • Information and knowledge extraction
    • Chatbots and text generation
  • Computational Modelling of Linguistic Variation
    • Theoretical synchronic linguistics
    • Sociolinguistic variation
    • Diachronic variation
    • Accessibility-related variation
    • Intertextual variation
  • Modelling and development of Language Resources
    • Construction, management and automatic annotation of Text Corpora
    • Development of Lexical Resources
    • Development of Annotated Corpora