Converting Legacy Data to CLDF: A FAIR Exit Strategy for Linguistic Web Apps

Robert Forkel, Daniel G. Swanson, Steven Moran


Abstract
In the mid 2000s, there were several large-scale US National Science Foundation (NSF) grants awarded to projects aiming at developing digital infrastructure and standards for different forms of linguistics data. For example, MultiTree encoded language family trees as phylogenies in XML and LL-MAP converted detailed geographic maps of endangered languages into KML. As early stand-alone website applications, these projects allowed researchers interested in comparative linguistics to explore language genealogies and areality, respectively. However as time passed, the technologies that supported these web apps became deprecated, unsupported, and inaccessible. Here we take a future-oriented approach to digital obsolescence and illustrate how to convert legacy linguistic resources into FAIR data via the Cross-Linguistic Data Formats (CLDF). CLDF is built on the W3C recommendations Model for Tabular Data and Metadata on the Web and Metadata Vocabulary for Tabular Data developed by the CSVW (CSV on the Web) working group. Thus, each dataset is modeled as a set of tabular data files described by metadata in JSON. These standards and the tools built to validate and manipulate them provide an accessible and extensible format for converting legacy linguistic web apps into FAIR datasets.
Anthology ID:
2024.lrec-main.353
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
3978–3982
Language:
URL:
https://aclanthology.org/2024.lrec-main.353
DOI:
Bibkey:
Cite (ACL):
Robert Forkel, Daniel G. Swanson, and Steven Moran. 2024. Converting Legacy Data to CLDF: A FAIR Exit Strategy for Linguistic Web Apps. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3978–3982, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Converting Legacy Data to CLDF: A FAIR Exit Strategy for Linguistic Web Apps (Forkel et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.353.pdf