Clément Dalloux

PhD student in computer sciences in Rennes (France)
Text mining and information extraction in clinical data


Since December 2016 - PhD student @IRISA, LINKMEDIA team

The IRISA teams in Rennes are hosted by INRIA

My thesis is part of the BigClin project which aims at developing a new clinical records representation relying on fine-grained semantic annotation thanks to new NLP tools dedicated to French clinical narratives. The project also addresses distributed systems issues: scalability, management of uncertain data and privacy, stream processing at runtime...
My role in this context is to develop and test NLP methods and tools in order to process unstructured clinical data in French. This methods have to be based on algorithms able to capture the semantics of the texts efficiently. Targeted tasks include indexing medical content, text mining, information extraction, dectection of uncertainty and negation, ect. NLP tasks will rely on a precise semantic annotation.

May 14-18 2018 @CORIA-TALN-RJC 2018, Rennes, France

In this picture of the organizing committee, I am carefully hidden ;)

I took part in the 2018 CORIA-TALN-RJC conference in three different way. Before the conference, I reviewed papers for RJC (Master and Phd student's papers). Then, as part of the organizing committee I worked on the website, communication with attendants, social media, etc. During the conference, I guided and informed people and did many other tasks. Moreover, on thursday, I presented a poster on negation detection in French and Brazilian Portuguese.

Nov-Dec 2017 @PUCPR, Curitiba - Brésil


As part of the BigClin/Figtem projects, I spent a month at the Pontifical Catholic University of Paraná in Curitiba, Brazil, where I worked on classification tasks using supervised learning algorithms on clinical data.

2015 - Master's degree internship @LIMICS


During my master's degree internship, I worked on the accordys project under the supervision of a PhD student. The main task of this internship was to evaluate the performances of document indexing models such as TF-IDF, LSI, LDA, etc. at the document level in order to compute the similarity between several cases of fetal malformation. To do so, we used a small corpus was composed of fetoplacental examinations, written in free text form.

2013-2015 - Master's degree in Linguistics @Université Bordeaux-Montaigne

I got my master’s degree in Linguistic Research and Applied Computations from the university Bordeaux-Montaigne. I followed many courses in linguistic theory (discourse analysis, syntax, semantics, etc.) and applied linguistics (natural language processing, corpus linguistics, etc.)


Dalloux, C., Claveau, V., and Grabar, N., Speculation and Negation Detection in French Biomedical Corpora, Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, september, 2019

Grabar, N., Claveau, V., and Dalloux, C., CAS: French Corpus with Clinical Cases, Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, Louhi@EMNLP 2018, Brussels, Belgium, October 31, 2018, 122-128, 2018. Link

Dalloux, C., Grabar, N., Claveau, V., and Moro, C. (2018). Portée de la négation : détection par apprentissage supervisé en français et portugais brésilien. 25e Conférence sur le Traitement Automatique des Langues Naturelles (TALN).

Dalloux, C., V. Claveau, et N. Grabar (2017). Détection de la négation : corpus français et apprentissage supervisé. In SIIM 2017 - Symposium sur l’Ingénierie de l’Information Médicale, Toulouse, France, pp. 1–8. Lien

Dalloux, C. (2017). Détection de l’incertitude et de la négation : un état de l’art. In 19es REncontres jeunes Chercheurs en Informatique pour le TAL (RECITAL 2017), pp. 94–107. Lien


We manually annotated two corpora from the biomedical field. The ESSAI corpus contains clinical trial protocols in French. They were mainly obtained from the National Cancer Institute The typical protocol consists of two parts: the summary of the trial, which indicates the purpose of the trial and the methods applied; and a detailed description of the trial with the inclusion and exclusion criteria. The CAS corpus contains clinical cases published in scientific literature and training material. They are published in different journals from French-speaking countries (France, Belgium, Switzerland, Canada, African countries, tropical countries) and are related to various medical specialties (cardiology, urology, oncology, obstetrics, pulmonology, gastro-enterology). The purpose of clinical cases is to describe clinical situations of patients. Hence, their content is close to the content of clinical narratives (description of diagnoses, treatments or procedures, evolution, family history, expected audience, etc.). In clinical cases, the negation is frequently used for describing the patient signs, symptoms, and diagnosis. Speculation is present as well but less frequently.

Annotation ESSAI CAS
Negative sentences 805
Negative instances 913
Speculative sentences 631 226
Speculative instances 754 244

