People and Places of the Past - Named Entity Recognition in Swedish Labour Movement Documents from Historical Sources

Crina Tudor, Eva Pettersson


Abstract
Named Entity Recognition (NER) is an important step in many Natural Language Processing tasks. The existing state-of-the-art NER systems are however typically developed based on contemporary data, and not very well suited for analyzing historical text. In this paper, we present a comparative analysis of the performance of several language models when applied to Named Entity Recognition for historical Swedish text. The source texts we work with are documents from Swedish labour unions from the 19th and 20th century. We experiment with three off-the-shelf models for contemporary Swedish text, and one language model built on historical Swedish text that we fine-tune with labelled data for adaptation to the NER task. Lastly, we propose a hybrid approach by combining the results of two models in order to maximize usability. We show that, even though historical Swedish is a low-resource language with data sparsity issues affecting overall performance, historical language models still show very promising results. Further contributions of our paper are the release of our newly trained model for NER of historical Swedish text, along with a manually annotated corpus of over 650 named entities.
Anthology ID:
2024.latechclfl-1.17
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
185–195
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.17
DOI:
Bibkey:
Cite (ACL):
Crina Tudor and Eva Pettersson. 2024. People and Places of the Past - Named Entity Recognition in Swedish Labour Movement Documents from Historical Sources. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 185–195, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
People and Places of the Past - Named Entity Recognition in Swedish Labour Movement Documents from Historical Sources (Tudor & Pettersson, LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.17.pdf
Supplementary material:
 2024.latechclfl-1.17.SupplementaryMaterial.zip
Video:
 https://aclanthology.org/2024.latechclfl-1.17.mp4