Michaela Geierhos


2024

pdf bib
Curation of Benchmark Templates for Measuring Gender Bias in Named Entity Recognition Models
Ana Cimitan | Ana Alves Pinto | Michaela Geierhos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Named Entity Recognition (NER) constitutes a popular machine learning technique that empowers several natural language processing applications. As with other machine learning applications, NER models have been shown to be susceptible to gender bias. The latter is often assessed using benchmark datasets, which in turn are curated specifically for a given Natural Language Processing (NLP) task. In this work, we investigate the robustness of benchmark templates to detect gender bias and propose a novel method to improve the curation of such datasets. The method, based on masked token prediction, aims to filter out benchmark templates with a higher probability of detecting gender bias in NER models. We tested the method for English and German, using the corresponding fine-tuned BERT base model (cased) as the NER model. The gender gaps detected with templates classified as appropriate by the method were statistically larger than those detected with inappropriate templates. The results were similar for both languages and support the use of the proposed method in the curation of templates designed to detect gender bias.

2021

pdf bib
Using Bloom’s Taxonomy to Classify Question Complexity
Sabine Ullrich | Michaela Geierhos
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

2017

pdf bib
Annotation Challenges for Reconstructing the Structural Elaboration of Middle Low German
Nina Seemann | Marie-Luis Merten | Michaela Geierhos | Doris Tophinke | Eyke Hüllermeier
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In this paper, we present the annotation challenges we have encountered when working on a historical language that was undergoing elaboration processes. We especially focus on syntactic ambiguity and gradience in Middle Low German, which causes uncertainty to some extent. Since current annotation tools consider construction contexts and the dynamics of the grammaticalization only partially, we plan to extend CorA - a web-based annotation tool for historical and other non-standard language data - to capture elaboration phenomena and annotator unsureness. Moreover, we seek to interactively learn morphological as well as syntactic annotations.

2016

pdf bib
On- and Off-Topic Classification and Semantic Annotation of User-Generated Software Requirements
Markus Dollmann | Michaela Geierhos
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
RELAX — Extraction de relations sémantiques dans les contextes biographiques [RELAX — Extractino of Semantic Relations in Biographical Contexts]
Michaela Geierhos | Olivier Blanc | Sandra Bsiri
Traitement Automatique des Langues, Volume 49, Numéro 1 : Varia [Varia]