Seeing Eye-to-Eye: Cross-Modal Coherence Relations Inform Eye-gaze Patterns During Comprehension & Production

Mert Inan, Malihe Alikhani


Abstract
Context influences how we engage with multimodal documents. Describing and processing the content of images is highly correlated with the goals of the discourse. It is known that these underlying cognitive processes can be tapped into by looking at eye movements, but the connection between discourse goals and eye movements is a missing link. In this study, we carry out both augmented reality and webcam-based eye-tracking experiments during comprehension and production tasks. We build on computational frameworks of coherence in text and images that study causal, logical, elaborative, and temporal inferences to understand how eye gaze patterns and coherence relations influence each other. No state-of-the-art techniques exist to analyze eye movements in multimodal language settings. So, we introduce a new eye gaze pattern ranking algorithm and a semantic gaze visualization technique to study this phenomenon better. Our results demonstrate that eye gaze durations are person-dependent, and during comprehension and production, ranked gaze patterns are significantly different for different types of coherence relations. We also present a case study of how Multimodal Large Language Models represent this connection of eye gaze patterns and coherence relations. We make all of our code and novel analysis tools available through https://github.com/Merterm/eye-gaze-coherence.
Anthology ID:
2024.lrec-main.1263
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14494–14512
Language:
URL:
https://aclanthology.org/2024.lrec-main.1263
DOI:
Bibkey:
Cite (ACL):
Mert Inan and Malihe Alikhani. 2024. Seeing Eye-to-Eye: Cross-Modal Coherence Relations Inform Eye-gaze Patterns During Comprehension & Production. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14494–14512, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Seeing Eye-to-Eye: Cross-Modal Coherence Relations Inform Eye-gaze Patterns During Comprehension & Production (Inan & Alikhani, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1263.pdf