About me

I’m a PhD student in Computer Science woking in Natural Language Processing (NLP) for historical documents at Sorbonne Université and at the ALMAnaCH research team at Inria.

I am interested in large corpora for training language models, specially for under resourced languages and historical languages. I am interested in tasks such as Name Entity Recognition (NER), Dependency Parsing and Part-of-Speech tagging, Machine Translation and Document structuration.

I love coffee, cookies and maths.

Interests
  • Language modeling
  • Corpus linguistics
  • Named Entity Recognition
  • Machine Translation
  • Computational Linguistics
Education
  • PhD in Computer Science

    Sorbonne Université

  • BASc MIASHS, 2018

    Université Paris 8

  • MSc in Mathematics, 2017

    Aix-Marseille Université

  • BSc in Mathematics, 2016

    Universidad Nacional de Colombia

Recent Publications

Projects

*
BASNUM

BASNUM

Digitization and analysis of Basnage de Beauval’s Universal Dictionary: lexicography and scientific networks

CamemBERT

CamemBERT

A state-of-the-art language model for French.

OSCAR

OSCAR

OSCAR or Open Super-large Crawled Aggregated coRpus is a huge multilingual corpus.

Contact