Gwénolé LECORVÉ

PhD, Habil.

Research Scientist


Orange
Innovation / Data & AI
Team NADIA (NAtural DIAlogue)

2, avenue Pierre Marzin
22307 Lannion, France

FirstName . LastName w/o accents [at] orange [dot] com


Research activities

Dialogue 2021-present

  • Natural language generation for question-answering
Research mates: Dr. Quentin Brabant, Dr. Lina Rojas Barahona

Paraphrase generation 2018-present

  • Paraphrase generation using constraints
  • Unsupervised style transfer
Research mates: Ms. Somayeh Jafaritazehjani, Dr. Nazanin Dehghani, Pr. John D. Kelleher, Dr. Jonathan Chevelu, Dr. Damien Lolive

Language for children2018-present

  • Characterization of what makes texts understandable by children
  • Prediction of age recommendation for texts
  • Explaining inadequation
  • Research mates: Dr. Rashedur Rahman, Ms. Aline Étienne, Dr. Nicolas Béchet, Pr. Delphine Battistelli, Dr. Jonathan Chevelu

Casuality and formality2015-present

  • Characterization of descriptors for the casual, neutral and formal linguistic register
  • Extraction of sequential patterns
  • Automatic collection of register-specific texts
  • Research mates: Dr. Nicolas Béchet, Pr. Delphine Battistelli, Dr. Jonathan Chevelu, Dr. Nazanin Dehghani, Dr. Inès Dabbebi, Ms. Jade Mekki, Mr. Benoît Fournier, Mr. Hugo Ayats

Text-to-speech (TTS)2012-2021

  • DNN TTS
  • Under-resourced languages
  • Data annotation & Perceptual evaluation
Research mates: Mr. Antoine Perquin, Dr. Damien Lolive, Dr. Laurent Amsaleg, Ms. Gaëlle Vidal, Mr. Hassan Hajipoor, Mr. Simon Giddings, Dr. Pierre Alain, Mr. Quentin Di-Fant, Dr. Arnaud Delhay, Dr. Waseem Safi. Mr. Pascal Lintanf

Pronunciation modeling 2012-2021

  • Grapheme-to-phoneme conversion
  • Pronunciation adaptation
    • Sponteneous and expressive speech
    • Voice-specific adaptation
Research mates: Dr. Raheel Qader, Dr. Damien Lolive, Dr. Marie Tahon, Pr. Pascale Sébillot, Dr. Katarina Bartkova, Mr. Simon Giddings, Dr. Aghilas Sini, Dr. Cédric Fayet

Disfluency modeling 2016-2018

  • Automatic insertion of disfluencies
  • Interruption point prediction
  • Natural language generation
Research mates: Dr. Raheel Qader, Dr. Damien Lolive, Pr. Pascale Sébillot, Mr. Henri Lasselin

Speech recognition and language modeling 2007-2012

  • Unsupervised topic adaptation of an automatic speech recognition system (Ph.D. thesis)
  • Linguistic adaptation for automatic speech recognition
  • Language modeling using recurrent neural networks
  • Multilingual language modeling
Research mates: Dr. Guillaume Gravier, Pr. Pascale Sébillot, Dr. Petr Motlicek, Dr. John Dines, Pr. Thomas Hain, Dr. Camille Guinaudeau, Dr. Stéphane Huet, Dr. David Imseng.

Project involvement

  • Breton Text-To-Speech: Creation of tools and data for the Breton speech synthesis
    • Public Breton Language Office
    • Deputy coordinator
    • 2019-2020
  • ANR TextToKids: Language register transformation using linguistic pattern extraction
    • French National Research Agency
    • Deputy coordinator, local coordinator
    • 2019-2023
  • PEPS TextToKids: Language register transformation using linguistic pattern extraction
    • CNRS (French National Center for Scientific Research)
    • Coordinator
    • 2018
  • H2020 NADINE digital iNtegrAteD system for the socIal support of migraNts / and refugEes
    • European Commission
    • 2018-2021
  • P2IA: Interactive application to help children how to read and write
    • French Minister of Education
    • Local coordinator
    • 2019-2021
  • Kaligo DYS Learning writing with dysgraphic and dyspraxic children and young adults
    • Pôle Images & Réseaux / Regional + EU funding
    • Local coordinator
    • 2018-2020
  • ANR TREMoLo: Language register transformation using linguistic pattern extraction
    • French National Research Agency
    • Coordinator
    • 2017-2021
  • ANR SynPaFlex: Flexible speech synthesis
    • French National Research Agency
    • Task leader on pronunciation adaptation
    • 2015-2019
  • TAO CSR: Task adaptation and optimization for continuous speech recognition.
    • Swiss Technology and Innovation Committee
    • 2011-2012

Collaborations and Supervisions

  • Current
    • Nazanin Dehghani (postdoctoral researcher, since 2019, with Jonathan Chevelu)
    • Aline Étienne (PhD candidate, since 2018, with Delphine Battistelli)
    • Simon Giddings (engineer, since 2019, with Damien Lolive)
    • Hassan Hajipoor (engineer, since 2019, with Damien Lolive)
    • Somayeh Jafaritazehjani (PhD candidate, since 2018, with Damien Lolive and John D. Kelleher)
    • Pascal Lintanf (linguistic expert, since 2020, with Damien Lolive)
    • Jade Mekki (PhD candidate + research intern, since 2018, with Nicolas Béchet and Delphine Battistelli)
    • Antoine Perquin (PhD candidate + MSc intern, since 2017, with Damien Lolive and Laurent Amsaleg)
    • Md Rashedur Rahman (postdoctoral researcher, since 2020, with Nicolas Béchet)
    • Gaëlle Vidal (engineer, with Damien Lolive)
  • Past
    • Hugo Ayats (research intern, 2017)
    • Alexis Blandin (MSc intern, 2018-2019, with Delphine Battistelli)
    • Charlotte Bourgoin (MSc intern, 2018, with Delphine Battistelli)
    • Inès Dabbebi (MSc intern, 2015, with Nicolas Béchet)
    • Quentin Di-Fant (engineer, since 2018, with Damien Lolive)
    • Aline Étienne (MSc intern, 2019, with Delphine Battistelli)
    • Cédric Fayet (postdoctoral researcher, since 2019, with Damien Lolive and Arnaud Delhay)
    • Benoît Fournier (research intern, 2017)
    • Henri Lasselin (MSc intern, 2017-2018)
    • Raheel Qader (PhD candidate, 2014-2017, with Damien Lolive and Pascale Sébillot)
    • Waseem Safi (postdoctoral researcher, 2018-2019, with Damien Lolive and Arnaud Delhay)
    • Marie Tahon (postdoc researcher, 2015-2017, with Damien Lolive)

Short bio

2021-present Research scientist at Orange.
2020 Habilitation to supervise research in computer science from University of Rennes 1.
2019-2021 Deputy nominated member of the French National Council of Universities (Conseil National des Universités, CNU) in the Computer Science section.
2014-2021 Permanent member in the research team Expression at IRISA (Lannion, France)
2012-2021 Associate professor at ENSSAT/University of Rennes 1 (Lannion, France)
2012-2013 Permanent member in the research team Cordial at IRISA (Lannion, France)
2011-2012 Postdoctoral researcher at Idiap Research Institute (Martigny, Switzerland).
2007-2010 Ph.D. candidate in computer science in the multimedia group (Texmex) at IRISA/INRIA (INSA de Rennes, France).
2007
  • M.Sc. in image processing and artificial intelligence at the Institut National des Sciences Appliquées (INSA de Rennes, France)
  • M.Eng. in computer science at the Institut National des Sciences Appliquées (INSA de Rennes, France)
2004 B.Sc. in mathematics and computer science at the Université de Bretagne Sud (Lorient, France)

Teaching

Main topics are in bold.

Currently (ENSSAT)

  • Machine Learning (M.Eng. & M.Sc.)

Past activities (ENSSAT, INSA Rennes)

  • Speech processing (M.Sc.)
  • Artificial Intelligence (M.Eng.)
  • Distributed algorithmics (M.Eng. & M.Sc.)
  • Java, object-oriented programming (B.Sc & M.Eng.)
  • Graph theory (M.Eng.)
  • Algorithmics (M.Eng.)
  • Unix and operating systems (M.Eng.)
  • Databases (B.Sc.)
  • Data structures (B.Sc.)
  • Symbolic data mining (M.Eng.)
  • Introduction to algorithmics and functional programming with Scheme (B.Sc.)

Some projects

Publications

Theses (2)

  1. Gwénolé Lecorvé. Traitement automatique du style dans le langage naturel : quelques contributions et perspectives / Natural Language Style Processing : Some Contributions and Perspectives Research habilitation thesis, Université de Rennes 1, 2020. PDF
  2. Gwénolé Lecorvé. Adaptation thématique non supervisée d'un système de reconnaissance automatique de la parole. PhD thesis, INSA de Rennes, 2010. AFCP Best PhD Award PDF

Book chapters (1)

  1. Stéphane Huet, Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Multimodal Processing and Interaction: Audio, Video, Text. Petros Maragos, Alexandros Potamianos, Patrick Gros (eds.), Chapter Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-syntax and pragmatics for transcription, Springer, 2008.

International journals (2)

  1. Guillaume Gravier, Camille Guinaudeau, Gwénole Lecorvé, Pascale Sébillot. Exploiting speech for automatic TV delinearization: from streams to cross-media semantic navigation, EURASIP Journal on Image and Video Processing, 2011. PDF
  2. Marie Tahon, Gwénolé Lecorvé, Damien Lolive. Can we Generate Emotional Pronunciations for Expressive Speech Synthesis?, IEEE Transactions on Affective Computing, 2018. PDF

International conferences (22)

  1. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. An unsupervised Web-based topic language model adaptation method.  in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008. PDF
  2. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. On the use of Web resources and natural language processing techniques to improve automatic speech recognition systems.  in Proceedings of the Conference on Language Resources and Evaluation (LREC), 2008. PDF
  3. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Constraint selection for topic-based MDI adaptation of language models  in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2009. PDF
  4. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Automatically finding semantically consistent n-grams to add new words in LVCSR systems, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011. PDF
  5. Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlicek. Supervised and unsupervised Web-based language model domain adaptation  in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2012. PDF
  6. Gwénolé Lecorvé, Petr Motlicek. Conversion of recurrent neural network language models to weighted finite state transducers for automatic speech recognition  in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2012. PDF
  7. David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé, Alexandre Nanchen. MediaParl: bilingual mixed language accented speech database in Proceedings of IEEE Workshop on Spoken Language Technology (SLT), 2012. PDF
  8. Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive. ROOTS: a toolkit for easy, fast and consistent processing of large sequential annotated data collections in Proceedings of the Conference on Language Resources and Evaluation (LREC), 2014. PDF
  9. Gwénolé Lecorvé, Damien Lolive. Adaptive statistical utterance phonetization for French in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. PDF
  10. Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Pascale Sébillot. Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features. in Proceedings of the International Conference on Statistical Language and Speech Processing (SLSP), 2015. PDF
  11. Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. Improving TTS with corpus-specific pronunciation adaptation, dans Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), 2016. PDF
  12. Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. Optimal feature set and minimal training size for pronunciation adaptation in TTS, dans Proceedings of the International Conference on Statistical Language and Speech Processing (SLSP), 2016. PDF
  13. Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Marie Tahon, Pascale Sébillot. Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis. in Proceedings of Text, Speech and Dialogue (TSD), 2017. PDF
  14. Marie Tahon, Gwénolé Lecorvé, Damien Lolive, Raheel Qader. Perception of expressivity in TTS: linguistics, phonetics or prosody?, dans Proceedings of the International Conference on Statistical Language and Speech Processing (SLSP), 2017. PDF
  15. Antoine Perquin, Gwénolé Lecorvé, Damien Lolive and Laurent Amsaleg. Phone-Level Embeddings for Unit Selection Speech Synthesis, dans Proceedings of the International Conference on Statistical Language and Speech Processing (SLSP), 2018. PDF
  16. Raheel Qader, Gwénolé Lecorvé, Damien Lolive and Pascale Sébillot. Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept, dans Proceedings of the International Conference on Statistical Language and Speech Processing (SLSP), 2018. PDF
  17. Gwénolé Lecorvé, Hugo Ayats, Benoı̂t Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, Nicolas Béchet. Towards the Automatic Processing of Language Registers: Semi-supervisedly Built Corpus and Classifier for French. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019. PDF
  18. Alexis Blandin, Gwénolé Lecorvé, Delphine Battistelli, Aline Étienne. Age Recommendation for Texts. Proceedings of the Language Resources and Evaluation Conference (LREC), 2020. PDF
  19. Rashedur Rahman, Gwénolé Lecorvé, Aline Étienne, Delphine Battistelli, Nicolas Béchet and Jonathan Chevelu. Mama/Papa, Is this Text for Me?. Proceedings of the International Conference on Computational Linguistics, short paper, 2020.
  20. Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive and John Kelleher. Style versus Content: A distinction without a (learnable) difference?. Proceedings of the International Conference on Computational Linguistics, long paper, 2020.
  21. Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive and John Kelleher. Style as Sentiment Versus Style as Formality: The Same or Different?. Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2021.
  22. Jade Mekki, Gwénolé Lecorvé, Delphine Battistelli, Nicolas Béchet. TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization. Proceedings of Recent Advances in Natural Language Processing (RANLP), 2021.

International workshops (7)

  1. David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé, Alexandre Nanchen. MediaParl: Bilingual mixed language accented speech database, in Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 2012.
  2. Pierre Alain, Jonathan Chevelu, David Guennec, Gwénolé Lecorvé, Damien Lolive. The IRISA Text-To-Speech System for the Blizzard Challenge 2015. In Proceedings of the Blizzard Challenge 2015 Workshop, 2015, Berlin, Germany. PDF
  3. Pierre Alain, Jonathan Chevelu, David Guennec, Gwénolé Lecorvé, Damien Lolive. The IRISA Text-To-Speech System for the Blizzard Challenge 2016. In Proceedings of the Blizzard Challenge 2016 Workshop, 2016, Sunnyvale, USA. PDF
  4. Pierre Alain, nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive, Claude Simon, Marie Tahon. The IRISA Text-To-Speech System for the Blizzard Challenge 2017. In Proceedings of the Blizzard Challenge 2017 Workshop, 2017, Stockholm, Sweden. PDF
  5. Pierre Alain, Gwénolé Lecorvé, Damien Lolive, Antoine Perquin. The IRISA Text-To-Speech System for the Blizzard Challenge 2018, in Proc. of the Blizzard Challenge Workshop 2018, 2018, Hyderabad, India. PDF
  6. Tabita Toparlak , Anaïd Donabédian, Damien Lolive, Gwénolé Lecorvé, Élisabeth Delais-Roussarie. Synthèse vocale de l’arménien. Communication at Digital Armenian, 2019.
  7. Nazanin Dehghani, Hassan Hajipoor, Jonathan Chevelu, Gwénolé Lecorvé, Controllable Paraphrase Generation with Multiple Types of Constraints. In NeurIPS workshop on Controllable Generation for Modeling in Language and Vision (CtrlGen), 2021.

French-speaking conferences (16)

  1. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Vers une adaptation thématique non supervisée de modèles de langage : utilisation d'Internet comme un corpus ouvert  in Actes des Journées d'Études sur la Parole (JEP), 2008. PDF
  2. Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. L'adaptation thématique d'un modèle de langue fait-elle apparaître des mots thématiques ?  in Actes des Journées d'Études sur la Parole (JEP), 2010. PDF
  3. Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlicek. Impact du degré de supervision sur l'adaptation à un domaine d'un modèle de langage à partir du Web in Actes des Journées d'Études sur la Parole (JEP), 2012. PDF
  4. Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive. ROOTS : un outil pour manipuler facilement, efficacement et avec cohérence des corpus annotés de séquences. Actes des Journées d'Études sur la Parole (JEP), 2014. PDF
  5. Gwénolé Lecorvé, Damien Lolive. Phonétisation statistique adaptable d'énoncés pour le français. Actes des Journées d'Études sur la Parole (JEP), 2016. PDF
  6. Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Pascale Sébillot. Adaptation de la prononciation pour la synthèse de la parole spontanée en utilisant des informations linguistiques. Actes des Journées d'Études sur la Parole (JEP), 2016. PDF
  7. Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Pascale Sébillot. Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept. Actes des Traitement Automatique du Langage Naturel (TALN), long paper, 2017. Best Paper Award PDF
  8. Gwénolé Lecorvé, Hugo Ayats, Benoît Fournier, Jade Mekki, Jonathan Chevelu, Delphine Battistelli, Nicolas Béchet. Construction conjointe d'un corpus et d'un classifieur pour les registres de langue en français. Actes des Traitement Automatique du Langage Naturel (TALN), long paper, 2018. PDF
  9. Jade Mekki, Delphine Battistelli, Gwénolé Lecorvé, Nicolas Béchet. Identification de descripteurs pour la caractérisation de registrest. Actes des Rencontres des Jeunes Chercheurs de la conférence CORIA-TALN, 2018. PDF
  10. Antoine Perquin, Gwénolé Lecorvé, Damien Lolive and Laurent Amsaleg. Évaluation objective de plongements pour la synthèse de parole guidée par réseaux de neurones. Actes des Traitement Automatique du Langage Naturel (TALN), 2019. PDF
  11. Alexis Blandin, Gwénolé Lecorvé, Delphine Battistelli, Aline Étienne. Recommandation d'âge pour des textes, in Actes de la conférence conjointe Journées d'Études sur la Parole (JEP), Traitement Automatique des Langues Naturelles (TALN), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL), 2020. PDF
  12. Aline Étienne, Delphine Battistelli, Gwénolé Lecorvé. L’expression des émotions dans les textes pour enfants : constitution d’un corpus annoté. Actes de la conférence conjointe Journées d'Études sur la Parole (JEP), Traitement Automatique des Langues Naturelles (TALN), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL), 2020. PDF
  13. Cédric Fayet, Alexis Blond, Grégoire Coulombel, Claude Simon, Damien Lolive, Gwénolé Lecorvé, Jonathan Chevelu, Sébastien Le Maguer. FlexEval, création de sites web légers pour des campagnes de tests perceptifs multimédias. Actes de la conférence conjointe Journées d'Études sur la Parole (JEP), Traitement Automatique des Langues Naturelles (TALN), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL), demo paper, 2020. PDF
  14. Jade Mekki, Nicolas Béchet, Delphine Battistelli, Gwénolé Lecorvé. Caractérisation de registres de langue par extraction de motifs séquentiels émergents. Actes des Journées Internationales d'Analyse statistique des Données Textuelles (JADT), 2020.
  15. Aline Étienne, Delphine Battistelli, Gwénolé Lecorvé. Apports de la linguistique et du TAL à l’analyse des émotions dans les textes pour enfants. Actes de colloque Langage et éMOTions, 2020.
  16. Jade Mekki, Nicolas Béchet, Delphine Battistelli, Gwénolé Lecorvé. TREMoLo : un corpus multi-étiquettes de tweets en français pour la caractérisation des registres de langue. Actes des Traitement Automatique du Langage Naturel (TALN), 2021.

Public Software

RNN2WFST

Conversion of recurrent neural network language models to weighted finite state transducers. Cite as follows:
@inproceedings{lecorve2012conversion, title={Conversion of recurrent neural network language models to weighted finite state transducers for automatic speech recognition}, author={Lecorv{\'e}, Gw{\'e}nol{\'e} and Motlicek, Petr}, booktitle={Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)}, year={2012}, location={Portland, Oregon, USA} }

> More on GitHub

Roots

Open source toolkit dedicated to annotated sequential data generation, management and processing. Cite as follows:
@inproceedings{chevelu2014roots, title={ROOTS: a toolkit for easy, fast and consistent processing of large sequential annotated data collections.}, author={Chevelu, Jonathan and Lecorv{\'e}, Gw{\'e}nol{\'e} and Lolive, Damien}, booktitle={Proceedings of the Language Resources and Evaluation Conference (LREC)}, pages={619--626}, year={2014}, location={Reykjavik, Iceland} }

> More on INRIA's GitLab

IRISA Text Normalizer

Scripts (Perl) to tokenize and normalize texts (French and English supported yet).

@misc{lecorve2017normalizer, title={The IRISA Text Normalizer}, author={Lecorv{\'e}, Gw{\'e}nol{\'e}}, howpublished={\url{https://github.com/glecorve/irisa-text-normalizer}}, year={2017} }
> More on GitHub

FlexEval

Python light-weight tool to create web-based annotation campaigns.

@inproceedings{fayet2020flexeval, title={FlexEval, cr{\'e}ation de sites web l{\'e}gers pour des campagnes de tests perceptifs multim{\'e}dias}, author={Fayet, C{\'e}dric and Blond, Alexis and Coulombel, Gr{\'e}goire and Simon, Claude and Lolive, Damien and Lecorv{\'e}, Gw{\'e}nol{\'e} and Chevelu, Jonathan and Le Maguer, S{\'e}bastien}, booktitle={6e conf{\'e}rence conjointe Journ{\'e}es d'{\'E}tudes sur la Parole (JEP, 31e {\'e}dition), Traitement Automatique des Langues Naturelles (TALN, 27e {\'e}dition), Rencontre des {\'E}tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R{\'E}CITAL, 22e {\'e}dition)}, pages={22--25}, year={2020}, organization={ATALA} }
> More on INRIA's GitLab
> More on GitHub