Unbabel-IST at the WMT Chat Translation Shared Task

João Alves; Pedro Henrique Martins; José G. C. de Souza; M. Amin Farajian; André F. T. Martins

doi:10.18653/v1/2022.wmt-1.89

Unbabel-IST at the WMT Chat Translation Shared Task

João Alves, Pedro Henrique Martins, José G. C. de Souza, M. Amin Farajian, André F. T. Martins

Abstract

We present the joint contribution of IST and Unbabel to the WMT 2022 Chat Translation Shared Task. We participated in all six language directions (English ↔ German, English ↔ French, English ↔ Brazilian Portuguese). Due to the lack of domain-specific data, we use mBART50, a large pretrained language model trained on millions of sentence-pairs, as our base model. We fine-tune it using a two step fine-tuning process. In the first step, we fine-tune the model on publicly available data. In the second step, we use the validation set. After having a domain specific model, we explore the use of kNN-MT as a way of incorporating domain-specific data at decoding time.

Anthology ID:: 2022.wmt-1.89
Volume:: Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 943–948
Language:
URL:: https://aclanthology.org/2022.wmt-1.89/
DOI:: 10.18653/v1/2022.wmt-1.89
Bibkey:
Cite (ACL):: João Alves, Pedro Henrique Martins, José G. C. de Souza, M. Amin Farajian, and André F. T. Martins. 2022. Unbabel-IST at the WMT Chat Translation Shared Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 943–948, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Unbabel-IST at the WMT Chat Translation Shared Task (Alves et al., WMT 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wmt-1.89.pdf

PDF Cite Search Fix data