TAeKD: Teacher Assistant Enhanced Knowledge Distillation for Closed-Source Multilingual Neural Machine Translation

Bo Lv, Xin Liu, Kaiwen Wei, Ping Luo, Yue Yu


Abstract
Knowledge Distillation (KD) serves as an efficient method for transferring language knowledge from open-source large language models (LLMs) to more computationally efficient models. However, challenges arise when attempting to apply vanilla KD methods to transfer knowledge from closed-source Multilingual Neural Machine Translation (MNMT) models based on LLMs. In this scenario, the soft labels and training data are not accessible, making it difficult to achieve effective knowledge transfer. To address this issue, this paper proposes a Teacher Assistant enhanced Knowledge Distillation (TAeKD) method to augment the knowledge transfer capacity from closed-source MNMT models. Specifically, TAeKD designs a fusion model that integrates translation outputs from multiple closed-source models to generate soft labels and training samples. Furthermore, a quality assessment learning mechanism is introduced to enhance the generalization of the fusion model and elevate the quality of the fusion data used to train the student model. To facilitate research on knowledge transfer from MNMT models, we also introduce FuseData, a benchmark consisting of a blend of translations from multiple closed-source systems. The experimental results show that TAeKD outperforms the previous state-of-the-art KD methods on both WMT22 and FLORES-101 test sets.
Anthology ID:
2024.lrec-main.1350
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
15530–15541
Language:
URL:
https://aclanthology.org/2024.lrec-main.1350
DOI:
Bibkey:
Cite (ACL):
Bo Lv, Xin Liu, Kaiwen Wei, Ping Luo, and Yue Yu. 2024. TAeKD: Teacher Assistant Enhanced Knowledge Distillation for Closed-Source Multilingual Neural Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15530–15541, Torino, Italia. ELRA and ICCL.
Cite (Informal):
TAeKD: Teacher Assistant Enhanced Knowledge Distillation for Closed-Source Multilingual Neural Machine Translation (Lv et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1350.pdf