Guimin Hu


2024

pdf bib
Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling
Zhihong Zhu | Xuxin Cheng | Guimin Hu | Yaowei Li | Zhiqi Huang | Yuexian Zou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multi-modal sarcasm detection aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic, which has received increasing attention due to the rapid growth of multi-modal posts on modern social media. However, mainstream models process the input of each modality in a holistic manner, resulting in redundant and unrefined information. Moreover, the representations of different modalities are entangled in one common latent space to perform complex cross-modal interactions, neglecting the heterogeneity and distribution gap of different modalities. To address these issues, we propose a novel framework DMMD (short for Disentangled Multi-grained Multi-modal Distilling) for multi-modal sarcasm detection, which conducts multi-grained knowledge distilling (i.e., intra-subspace and inter-subspace) based on the disentangled multi-modal representations. Concretely, the representations of each modality are disentangled explicitly into modality-agnostic/specific subspaces. Then we transfer cross-modal knowledge by conducting intra-subspace knowledge distilling in a self-adaptive pattern. We also apply mutual learning to regularize the underlying inter-subspace consistency. Extensive experiments on a commonly used benchmark demonstrate the efficacy of our DMMD over cutting-edge methods. More encouragingly, visualization results indicate the multi-modal representations display meaningful distributional patterns, and we hope it will be helpful for the community of multi-modal knowledge transfer.

2022

pdf bib
UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition
Guimin Hu | Ting-En Lin | Yi Zhao | Guangming Lu | Yuchuan Wu | Yongbin Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.

2021

pdf bib
Bidirectional Hierarchical Attention Networks based on Document-level Context for Emotion Cause Extraction
Guimin Hu | Guangming Lu | Yi Zhao
Findings of the Association for Computational Linguistics: EMNLP 2021

Emotion cause extraction (ECE) aims to extract the causes behind the certain emotion in text. Some works related to the ECE task have been published and attracted lots of attention in recent years. However, these methods neglect two major issues: 1) pay few attentions to the effect of document-level context information on ECE, and 2) lack of sufficient exploration for how to effectively use the annotated emotion clause. For the first issue, we propose a bidirectional hierarchical attention network (BHA) corresponding to the specified candidate cause clause to capture the document-level context in a structured and dynamic manner. For the second issue, we design an emotional filtering module (EF) for each layer of the graph attention network, which calculates a gate score based on the emotion clause to filter the irrelevant information. Combining the BHA and EF, the EF-BHA can dynamically aggregate the contextual information from two directions and filters irrelevant information. The experimental results demonstrate that EF-BHA achieves the competitive performances on two public datasets in different languages (Chinese and English). Moreover, we quantify the effect of context on emotion cause extraction and provide the visualization of the interactions between candidate cause clauses and contexts.