Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network

被引:0
|
作者
Ji, Jiayi [1 ]
Luo, Yunpeng [1 ]
Sun, Xiaoshuai [1 ,2 ]
Chen, Fuhai [1 ]
Luo, Gen [1 ]
Wu, Yongjian [3 ]
Gao, Yue [4 ]
Ji, Rongrong [1 ,2 ]
机构
[1] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Media Analyt & Comp Lab, Xiamen, Peoples R China
[2] Xiamen Univ, Inst Artificial Intelligence, Xiamen, Peoples R China
[3] Tencent Youtu Lab, Xiamen, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based architectures have shown great success in image captioning, where object regions are encoded and then attended into the vectorial representations to guide the caption decoding. However, such vectorial representations only contain region-level information without considering the global information reflecting the entire image, which fails to expand the capability of complex multi-modal reasoning in image captioning. In this paper, we introduce a Global Enhanced Transformer (termed GET) to enable the extraction of a more comprehensive global representation, and then adaptively guide the decoder to generate high-quality captions. In GET, a Global Enhanced Encoder is designed for the embedding of the global feature, and a Global Adaptive Decoder are designed for the guidance of the caption generation. The former models intra- and inter-layer global representation by taking advantage of the proposed Global Enhanced Attention and a layer-wise fusion module. The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation. Extensive experiments on MS COCO dataset demonstrate the superiority of our GET over many state-of-the-arts.
引用
收藏
页码:1655 / 1663
页数:9
相关论文
共 30 条
  • [1] Improving Intra- and Inter-Modality Visual Relation for Image Captioning
    Wang, Yong
    Zhang, WenKai
    Liu, Qing
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4190 - 4198
  • [2] Intra- and Inter-Head Orthogonal Attention for Image Captioning
    Zhang, Xiaodan
    Jia, Aozhe
    Ji, Junzhong
    Qu, Liangqiong
    Ye, Qixiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 594 - 607
  • [3] Training Moment Neuronal Networks with Intra- and Inter-layer Interactions
    Xiang, Xuyan
    Deng, Yingchun
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (01): : 363 - 373
  • [4] Confinement in bilayer graphene via intra- and inter-layer interactions
    Castillo-Celeita, Miguel
    Jakubsky, Vit
    Zelaya, Kevin
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2022, 55 (03)
  • [5] I3N: Intra- and Inter-Representation Interaction Network for Change Captioning
    Yue, Shengbin
    Tu, Yunbin
    Li, Liang
    Yang, Ying
    Gao, Shengxiang
    Yu, Zhengtao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8828 - 8841
  • [6] Intra- and inter-layer charge redistribution in biased bilayer graphene
    Wang, Rui-Ning
    Dong, Guo-Yi
    Wang, Shu-Fang
    Fu, Guang-Sheng
    Wang, Jiang-Long
    AIP ADVANCES, 2016, 6 (03):
  • [7] I2Transformer: Intra- and Inter-Relation Embedding Transformer for TV Show Captioning
    Tu, Yunbin
    Li, Liang
    Su, Li
    Gao, Shengxiang
    Yan, Chenggang
    Zha, Zheng-Jun
    Yu, Zhengtao
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3565 - 3577
  • [8] The interplay of intra- and inter-layer interactions in bending rigidity of ultrathin 2D materials
    Jiang, Yingchun
    Sridhar, Srividhya
    Liu, Zihan
    Wang, Dingli
    Zhou, Huimin
    Deng, Jia
    Chew, Huck Beng
    Ke, Changhong
    APPLIED PHYSICS LETTERS, 2023, 122 (15)
  • [9] Inter-Layer Correlation in a Feed-Forward Network with Intra-Layer Common Noise
    Karakida, Ryo
    Igarashi, Yasuhiko
    Nagata, Kenji
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2013, 82 (06)
  • [10] Synchronization in a memristive duplex network: The impact of intra-layer and inter-layer synaptic pathways
    Mehrabbeik, Mahtab
    Jafari, Sajad
    Parastesh, Fatemeh
    PHYSICS LETTERS A, 2025, 530