ATOLL Research team
Software tools for natural language
- Leader : Eric Villemonte de la clergerie
- Research center(s) : CRI de Paris
- Field : Symbolic systems
- Theme : Management and processing of language and data
The automatic processing of natural language documents has become a major issue, if we are to be able to use efficiently the huge amount of information available in the world. This problem is growing more important every day with the increasing use of Internet. Our project-team aims to develop tools and techniques, theoretical or applied, in order to help to access, process and use documents in natural language.
- Parsing: Theoretical and practical investigation of parsing
techniques for various grammatical formalisms used in Natural
Language Processing. In particular, we focus on tabular techniques
to handle ambiguities in language and design several parsing
- SYNTAX: This system may be used to compile deterministic or non-deterministic Context-Free Grammars [CFG].
- Range Concatenation Grammars [RCG]: Introduced by Pierre Boullier, this hierarchy of grammars allows an efficient exploration of midly context-sensitive [MCS] grammar formalisms. An implementation of RCG exists which has been used, for instance, very successfully for Tree Adjoining Grammars [TAG].
- Automata and Dynamic Programming: Stack automata may describe various parsing strategies and Dynamic Programming interpretations of these automata are derived to design tabular parsers. This approach is implemented within DyALog system and works for various unification-based formalisms (DCG, Feature TAG, RCG, ...) and logic programs.
- Linguistic Infrastructure : ATOLL develops a workbench for TAG
based on XML representations. This workbench includes parsers built
with systems RCG and DyALog as well as servers to access these
parsers, grammars and derivation forests produced by the parsers.
- Knowledge acquisition: in this emerging theme, we would like to
explore the interactions between lexical knowledge and parsing. More
knowledge on words may help parsing and conversely parsing may be
used to extract knowledge from corpora.
International and industrial relations
- Action Normalangue: this French action concerns the normalization of linguistic ressources.
- ARC "Lexical Resources for TAGs" [RLT]: In cooperation with "
Langue et Dialogue " (LORIA, Nancy) and TALaNa (University Paris 7).
The main objective of this action is the semi-automatic acquisition
of lexicon entries for the TALaNa's French TAG Grammar, using
information coming from parsing corpora.
ARC "Generation and Inference" [GENI]: in cooperation with "Langue et Dialogue", Orpailleur (LORIA), Lattice et ILPL (IRIT, Toulouse). ATOLL provides some expertise about TAGs and is interested by some aspects of lexical semantic.
- Action FASTLING: Bi-national action between ATOLL, CENTRIA
(Lisbon) and LIFO (University of Orleans). This action extends an
older one where a robust Portuguese parser was developped using
- Action "botanic" : This action, in collaboration with IRD, is
still informal. The objective is the handling of botanic corpora, in
particular using linguistic techniques to do text miming.