Rebalancing Label Distribution While Eliminating Inherent Waiting Time in Multi Label Active Learning Applied to Transformers

Maxime Arens, Lucile Callebert, Mohand Boughanem, Jose G. Moreno


Abstract
Data annotation is crucial for machine learning, notably in technical domains, where the quality and quantity of annotated data, significantly affect effectiveness of trained models. Employing humans is costly, especially when annotating for multi-label classification, as instances may bear multiple labels. Active Learning (AL) aims to alleviate annotation costs by intelligently selecting instances for annotation, rather than randomly annotating. Recent attention on transformers has spotlighted the potential of AL in this context. However, in practical settings, implementing AL faces challenges beyond theory. Notably, the gap between AL cycles presents idle time for annotators. To address this issue, we investigate alternative instance selection methods, aiming to maximize annotation efficiency by seamlessly integrating with the AL process. We begin by evaluating two existing methods in our transformer setting, employing respectively random sampling and outdated information. Following this we propose our novel method based on annotating instances to rebalance label distribution. Our approach mitigates biases, enhances model performance (up to 23% improvement on f1score), reduces strategy-dependent disparities (decrease of nearly 50% on standard deviation) and reduces label imbalance (decrease of 30% on Mean Imbalance Ratio).
Anthology ID:
2024.lrec-main.1190
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13621–13632
Language:
URL:
https://aclanthology.org/2024.lrec-main.1190
DOI:
Bibkey:
Cite (ACL):
Maxime Arens, Lucile Callebert, Mohand Boughanem, and Jose G. Moreno. 2024. Rebalancing Label Distribution While Eliminating Inherent Waiting Time in Multi Label Active Learning Applied to Transformers. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13621–13632, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Rebalancing Label Distribution While Eliminating Inherent Waiting Time in Multi Label Active Learning Applied to Transformers (Arens et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1190.pdf