Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning

被引:13
|
作者
Yang, Xu [1 ]
Gao, Chongyang [2 ]
Zhang, Hanwang [1 ]
Cai, Jianfei [3 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Dartmouth Coll, Hanover, NH 03755 USA
[3] Monash Univ, Melbourne, Vic, Australia
关键词
Image Paragraph Generation; Scene Graph; Hierarchical Constrain; Hierarchical Scene Graph Encoder Decoder;
D O I
10.1145/3394171.3413859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When we humans tell a long paragraph about an image, we usually first implicitly compose a mental "script" and then comply with it to generate the paragraph. Inspired by this, we render the modern encoder-decoder based image paragraph captioning model such ability by proposing Hierarchical Scene Graph Encoder-Decoder (HSGED) for generating coherent and distinctive paragraphs. In particular, we use the image scene graph as the "script" to incorporate rich semantic knowledge and, more importantly, the hierarchical constraints into the model. Specifically, we design a sentence scene graph RNN (SSG-RNN) to generate sub-graph level topics, which constrain the word scene graph RNN (WSG-RNN) to generate the corresponding sentences. We propose irredundant attention in SSG-RNN to improve the possibility of abstracting topics from rarely described sub-graphs and inheriting attention in WSG-RNN to generate more grounded sentences with the abstracted topics, both of which give rise to more distinctive paragraphs. An efficient sentence-level loss is also proposed for encouraging the sequence of generated sentences to be similar to that of the ground-truth paragraphs. We validate HSGED on Stanford image paragraph dataset and show that it not only achieves a new state-of-the-art 36.02 CIDEr-D, but also generates more coherent and distinctive paragraphs under various metrics.
引用
收藏
页码:4181 / 4189
页数:9
相关论文
共 50 条
  • [1] Deep Hierarchical Encoder-Decoder Network for Image Captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
  • [2] Parallel encoder-decoder framework for image captioning
    Saeidimesineh, Reyhane
    Adibi, Peyman
    Karshenas, Hossein
    Darvishy, Alireza
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [3] Image Captioning: From Encoder-Decoder to Reinforcement Learning
    Tang, Yu
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
  • [4] An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Peethala, Mahesh Babu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3019 - 3024
  • [5] The Optimal Choice of the Encoder-Decoder Model Components for Image Captioning
    Bartosiewicz, Mateusz
    Iwanowski, Marcin
    [J]. INFORMATION, 2024, 15 (08)
  • [6] Dense Video Captioning with Hierarchical Attention-Based Encoder-Decoder Networks
    Yu, Mingjing
    Zheng, Huicheng
    Liu, Zehua
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Sinha, Sushant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [8] MICER: a pre-trained encoder-decoder architecture for molecular image captioning
    Yi, Jiacai
    Wu, Chengkun
    Zhang, Xiaochen
    Xiao, Xinyi
    Qiu, Yanlong
    Zhao, Wentao
    Hou, Tingjun
    Cao, Dongsheng
    [J]. BIOINFORMATICS, 2022, 38 (19) : 4562 - 4572
  • [9] Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Rai, Gaurav
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [10] Graph Regularized Encoder-Decoder Networks for Image Representation Learning
    Yang, Shijie
    Li, Liang
    Wang, Shuhui
    Zhang, Weigang
    Huang, Qingming
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3124 - 3136