Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets

Shadi Manafi, Nikhil Krishnaswamy


Abstract
Multilingual Language Models (MLLMs) exhibit robust cross-lingual transfer capabilities, or the ability to leverage information acquired in a source language and apply it to a target language. These capabilities find practical applications in well-established Natural Language Processing (NLP) tasks such as Named Entity Recognition (NER). This study aims to investigate the effectiveness of a source language when applied to a target language, particularly in the context of perturbing the input test set. We evaluate on 13 pairs of languages, each including one high-resource language (HRL) and one low-resource language (LRL) with a geographic, genetic, or borrowing relationship. We evaluate two well-known MLLMs—MBERT and XLM-R—on these pairs, in native LRL and cross-lingual transfer settings, in two tasks, under a set of different perturbations. Our findings indicate that NER cross-lingual transfer depends largely on the overlap of entity chunks. If a source and target language have more entities in common, the transfer ability is stronger. Models using cross-lingual transfer also appear to be somewhat more robust to certain perturbations of the input, perhaps indicating an ability to leverage stronger representations derived from the HRL. Our research provides valuable insights into cross-lingual transfer and its implications for NLP applications, and underscores the need to consider linguistic nuances and potential limitations when employing MLLMs across distinct languages.
Anthology ID:
2024.lrec-main.372
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4174–4184
Language:
URL:
https://aclanthology.org/2024.lrec-main.372
DOI:
Bibkey:
Cite (ACL):
Shadi Manafi and Nikhil Krishnaswamy. 2024. Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4174–4184, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets (Manafi & Krishnaswamy, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.372.pdf