UDMorph: Morphosyntactically Tagged UD Corpora

Maarten Janssen


Abstract
UDMorph provides an infrastructure parallel to that provided by UD for annotated corpus data that follow the UD guidelines, but do not provide dependency relations: a place where new annotated data-sets can be deposited, and existing data-sets can be found and downloaded. It also provides a corpus creation environment to easily create annotated data for additional languages. And it provides a REST and GUI interface to a growing collection taggers with a CoNLL-U output, currently for around 150 different languages.
Anthology ID:
2024.lrec-main.1472
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
16933–16940
Language:
URL:
https://aclanthology.org/2024.lrec-main.1472
DOI:
Bibkey:
Cite (ACL):
Maarten Janssen. 2024. UDMorph: Morphosyntactically Tagged UD Corpora. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 16933–16940, Torino, Italia. ELRA and ICCL.
Cite (Informal):
UDMorph: Morphosyntactically Tagged UD Corpora (Janssen, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1472.pdf