Junjia Feng


2024

pdf bib
Autonomous Aspect-Image Instruction a2II: Q-Former Guided Multimodal Sentiment Classification
Junjia Feng | Mingqian Lin | Lin Shang | Xiaoying Gao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multimodal aspect-oriented sentiment classification (MABSC) task has garnered significant attention, which aims to identify the sentiment polarities of aspects by combining both language and vision information. However, the limited multimodal data in this task has become a big gap for the vision-language multimodal fusion. While large-scale vision-language pretrained models have been adapted to multiple tasks, their use for MABSC task is still in a nascent stage. In this work, we present an attempt to use the instruction tuning paradigm to MABSC task and leverage the ability of large vision-language models to alleviate the limitation in the fusion of textual and image modalities. To tackle the problem of potential irrelevance between aspects and images, we propose a plug-and-play selector to autonomously choose the most appropriate instruction from the instruction pool, thereby reducing the impact of irrelevant image noise on the final sentiment classification results. We conduct extensive experiments in various scenarios and our model achieves state-of-the-art performance on benchmark datasets, as well as in few-shot settings.