A Benchmark Evaluation of Clinical Named Entity Recognition in French

Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier


Abstract
Background: Transformer-based language models have shown strong performance on many Natural Language Processing (NLP) tasks. Masked Language Models (MLMs) attract sustained interest because they can be adapted to different languages and sub-domains through training or fine-tuning on specific corpora while remaining lighter than modern Large Language Models (MLMs). Recently, several MLMs have been released for the biomedical domain in French, and experiments suggest that they outperform standard French counterparts. However, no systematic evaluation comparing all models on the same corpora is available. Objective: This paper presents an evaluation of masked language models for biomedical French on the task of clinical named entity recognition. Material and methods: We evaluate biomedical models CamemBERT-bio and DrBERT and compare them to standard French models CamemBERT, FlauBERT and FrAlBERT as well as multilingual mBERT using three publically available corpora for clinical named entity recognition in French. The evaluation set-up relies on gold-standard corpora as released by the corpus developers. Results: Results suggest that CamemBERT-bio outperforms DrBERT consistently while FlauBERT offers competitive performance and FrAlBERT achieves the lowest carbon footprint. Conclusion: This is the first benchmark evaluation of biomedical masked language models for French clinical entity recognition that compares model performance consistently on nested entity recognition using metrics covering performance and environmental impact.
Anthology ID:
2024.lrec-main.2
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14–21
Language:
URL:
https://aclanthology.org/2024.lrec-main.2
DOI:
Bibkey:
Cite (ACL):
Nesrine Bannour, Christophe Servan, Aurélie Névéol, and Xavier Tannier. 2024. A Benchmark Evaluation of Clinical Named Entity Recognition in French. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14–21, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Benchmark Evaluation of Clinical Named Entity Recognition in French (Bannour et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.2.pdf