Emotags: Computer-Assisted Verbal Labelling of Expressive Audiovisual Utterances for Expressive Multimodal TTS

Gérard Bailly, Romain Legrand, Martin Lenglet, Frédéric Elisei, Maëva Hueber, Olivier Perrotin


Abstract
We developped a web app for ascribing verbal descriptions to expressive audiovisual utterances. These descriptions are limited to lists of adjectives that are either suggested via a navigation in emotional latent spaces built using discriminant analysis of BERT embeddings or entered freely by subjects. We show that such verbal descriptions collected on-line via Prolific on massive data (310 participants, 12620 labelled utterances up-to-now) provide Expressive Multimodal Text-to-Speech Synthesis with precise verbal control over desired emotional content
Anthology ID:
2024.lrec-main.505
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
5689–5695
Language:
URL:
https://aclanthology.org/2024.lrec-main.505
DOI:
Bibkey:
Cite (ACL):
Gérard Bailly, Romain Legrand, Martin Lenglet, Frédéric Elisei, Maëva Hueber, and Olivier Perrotin. 2024. Emotags: Computer-Assisted Verbal Labelling of Expressive Audiovisual Utterances for Expressive Multimodal TTS. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5689–5695, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Emotags: Computer-Assisted Verbal Labelling of Expressive Audiovisual Utterances for Expressive Multimodal TTS (Bailly et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.505.pdf