Increase the DIgital VITALity and visibility of languages of France:
linguistic descriptions and annotated corpora

ANR logo
ANR-21-CE27-0004

Increase the DIgital VITALity and visibility of languages of France: <br>
linguistic descriptions and annotated corpora

Description

This project aims to increase the vitality and visibility of several languages of France: Alsatian, Corsican, Occitan and Poitevin-Saintongeais. It is positioned at the crossroads of descriptive linguistics and corpus linguistics. Its main goal is the constitution of resources, in particular raw and annotated corpora, with several objectives:

  • Build (i) monolingual corpora in genres that are close to or transcribe oral language, for example plays or narrative ethnotexts, and (ii) parallel corpora (from translations);
  • Develop annotated corpora in the “Universal Dependencies” framework ;
  • Produce complete and up-to-date descriptions and linguistic formalisations based on corpora;
  • Raise awareness in the NLP (Natural Language Processing) community of the problems of non-standardised languages and the need to take variation into account in NLP systems;
  • Share and transfer experiences and tools between languages in the project and explore methods of technology transfer.