Sites Inria

Version française

ATOLL Research team

Software tools for natural language

  • Leader : Eric Villemonte de la clergerie
  • Research center(s) : CRI de Paris
  • Field : Symbolic systems
  • Theme : Management and processing of language and data

Team presentation

The automatic processing of natural language documents has become a major issue, if we are to be able to use efficiently the huge amount of information available in the world. This problem is growing more important every day with the increasing use of Internet. Our project-team aims to develop tools and techniques, theoretical or applied, in order to help to access, process and use documents in natural language.

Research themes

  • Parsing: Theoretical and practical investigation of parsing techniques for various grammatical formalisms used in Natural Language Processing. In particular, we focus on tabular techniques to handle ambiguities in language and design several parsing systems:

    • SYNTAX: This system may be used to compile deterministic or non-deterministic Context-Free Grammars [CFG].
    • Range Concatenation Grammars [RCG]: Introduced by Pierre Boullier, this hierarchy of grammars allows an efficient exploration of midly context-sensitive [MCS] grammar formalisms. An implementation of RCG exists which has been used, for instance, very successfully for Tree Adjoining Grammars [TAG].
    • Automata and Dynamic Programming: Stack automata may describe various parsing strategies and Dynamic Programming interpretations of these automata are derived to design tabular parsers. This approach is implemented within DyALog system and works for various unification-based formalisms (DCG, Feature TAG, RCG, ...) and logic programs.

  • Linguistic Infrastructure : ATOLL develops a workbench for TAG based on XML representations. This workbench includes parsers built with systems RCG and DyALog as well as servers to access these parsers, grammars and derivation forests produced by the parsers.

  • Knowledge acquisition: in this emerging theme, we would like to explore the interactions between lexical knowledge and parsing. More knowledge on words may help parsing and conversely parsing may be used to extract knowledge from corpora.

International and industrial relations

  • Action Normalangue: this French action concerns the normalization of linguistic ressources.

  • ARC "Lexical Resources for TAGs" [RLT]: In cooperation with " Langue et Dialogue " (LORIA, Nancy) and TALaNa (University Paris 7). The main objective of this action is the semi-automatic acquisition of lexicon entries for the TALaNa's French TAG Grammar, using information coming from parsing corpora.

  • ARC "Generation and Inference" [GENI]: in cooperation with "Langue et Dialogue", Orpailleur (LORIA), Lattice et ILPL (IRIT, Toulouse). ATOLL provides some expertise about TAGs and is interested by some aspects of lexical semantic.

  • Action FASTLING: Bi-national action between ATOLL, CENTRIA (Lisbon) and LIFO (University of Orleans). This action extends an older one where a robust Portuguese parser was developped using DyALog system.

  • Action "botanic" : This action, in collaboration with IRD, is still informal. The objective is the handling of botanic corpora, in particular using linguistic techniques to do text miming.

Keywords: Parsing Natural language Linguistics Dynamic programming Logic programming Electronic document