Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Yuanzhen Luo, Qingyu Zhou, Feng Zhou


Abstract
Keyphrase extraction (KPE) is an important task in Natural Language Processing for many scenarios, which aims to extract keyphrases that are present in a given document. Many existing supervised methods treat KPE as sequential labeling, span-level classification, or generative tasks. However, these methods lack the ability to utilize keyphrase information, which may result in biased results. In this study, we propose Diff-KPE, which leverages the supervised Variational Information Bottleneck (VIB) to guide the text diffusion process for generating enhanced keyphrase representations. Diff-KPE first generates the desired keyphrase embeddings conditioned on the entire document and then injects the generated keyphrase embeddings into each phrase representation. A ranking network and VIB are then optimized together with rank loss and classification loss, respectively. This design of Diff-KPE allows us to rank each candidate phrase by utilizing both the information of keyphrases and the document. Experiments show that Diff-KPE outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.
Anthology ID:
2024.lrec-main.535
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
6036–6047
Language:
URL:
https://aclanthology.org/2024.lrec-main.535
DOI:
Bibkey:
Cite (ACL):
Yuanzhen Luo, Qingyu Zhou, and Feng Zhou. 2024. Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6036–6047, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction (Luo et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.535.pdf