Enhancing Descriptive Image Captioning with Natural Language Inference

Zhan Shi; Hui Liu; Xiaodan Zhu

doi:10.18653/v1/2021.acl-short.36

Enhancing Descriptive Image Captioning with Natural Language Inference

Abstract

Generating descriptive sentences that convey non-trivial, detailed, and salient information about images is an important goal of image captioning. In this paper we propose a novel approach to encourage captioning models to produce more detailed captions using natural language inference, based on the motivation that, among different captions of an image, descriptive captions are more likely to entail less descriptive captions. Specifically, we construct directed inference graphs for reference captions based on natural language inference. A PageRank algorithm is then employed to estimate the descriptiveness score of each node. Built on that, we use reference sampling and weighted designated rewards to guide captioning to generate descriptive captions. The results on MSCOCO show that the proposed method outperforms the baselines significantly on a wide range of conventional and descriptiveness-related evaluation metrics.

Anthology ID:: 2021.acl-short.36
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 269–277
Language:
URL:: https://aclanthology.org/2021.acl-short.36/
DOI:: 10.18653/v1/2021.acl-short.36
Bibkey:
Cite (ACL):: Zhan Shi, Hui Liu, and Xiaodan Zhu. 2021. Enhancing Descriptive Image Captioning with Natural Language Inference. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 269–277, Online. Association for Computational Linguistics.
Cite (Informal):: Enhancing Descriptive Image Captioning with Natural Language Inference (Shi et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-short.36.pdf
Video:: https://aclanthology.org/2021.acl-short.36.mp4
Code: gitsamshi/nli-image-caption
Data: MS COCO, SNLI

PDF Cite Search Code Video Fix data