A Multimodal Simultaneous Interpretation Prototype: Who Said What

Xiaolin Wang; Masao Utiyama; Eiichiro Sumita

A Multimodal Simultaneous Interpretation Prototype: Who Said What

Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

Abstract

“Who said what” is essential for users to understand video streams that have more than one speaker, but conventional simultaneous interpretation systems merely present “what was said” in the form of subtitles. Because the translations unavoidably have delays and errors, users often find it difficult to trace the subtitles back to speakers. To address this problem, we propose a multimodal SI system that presents users “who said what”. Our system takes audio-visual approaches to recognize the speaker of each sentence, and then annotates its translation with the textual tag and face icon of the speaker, so that users can quickly understand the scenario. Furthermore, our system is capable of interpreting video streams in real-time on a single desktop equipped with two Quadro RTX 4000 GPUs owing to an efficient sentence-based architecture.

Anthology ID:: 2022.amta-upg.10
Volume:: Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
Month:: September
Year:: 2022
Address:: Orlando, USA
Editors:: Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 132–143
Language:
URL:: https://aclanthology.org/2022.amta-upg.10/
DOI:
Bibkey:
Cite (ACL):: Xiaolin Wang, Masao Utiyama, and Eiichiro Sumita. 2022. A Multimodal Simultaneous Interpretation Prototype: Who Said What. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 132–143, Orlando, USA. Association for Machine Translation in the Americas.
Cite (Informal):: A Multimodal Simultaneous Interpretation Prototype: Who Said What (Wang et al., AMTA 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.amta-upg.10.pdf
Presentation:: 2022.amta-upg.10.Presentation.pdf

PDF Cite Search Presentation Fix data