Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos

Anna Kuznetsova, Carlo Strapparava


Abstract
This paper presents the development of a novel multimodal multilingual dataset in Russian and English, with a particular emphasis on the exploration of laughter detection techniques. Data was collected from YouTube stand-up comedy videos with manually annotated subtitles, and our research covers data preparation and laughter labeling. We explore two laughter detection approaches presented in the literature: peak detection using preprocessed voiceless audio with an energy-based algorithm and machine learning approach with pretrained models to identify laughter presence and duration. While the machine learning approach currently outperforms peak detection in accuracy and generalization, the latter shows promise and warrants further study. Additionally, we explore unimodal and multimodal humor detection on the new dataset, showing the effectiveness of neural models in capturing humor in both languages, even with textual data. Multimodal experiments indicate that even basic models benefit from visual data, improving detection results. However, further research is needed to enhance laughter detection labeling quality and fully understand the impact of different modalities in a multimodal and multilingual context.
Anthology ID:
2024.lrec-main.1037
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11884–11889
Language:
URL:
https://aclanthology.org/2024.lrec-main.1037
DOI:
Bibkey:
Cite (ACL):
Anna Kuznetsova and Carlo Strapparava. 2024. Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11884–11889, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Multimodal and Multilingual Laughter Detection in Stand-Up Comedy Videos (Kuznetsova & Strapparava, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1037.pdf