MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING

被引:0
|
作者
Chen, Tianyu [1 ]
Li, Zhixin [1 ]
Wei, Jiahui [1 ]
Xian, Tiantao [1 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
image captioning; external knowledge; object relation; LANGUAGE;
D O I
10.1109/ICASSP43922.2022.9747541
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Internal relationship of image objects has contributed significantly to the development of image captioning, especially when combined with Transformer architecture. Most of these methods only calculate the relationship between entities and ignore the information between entities and background. Besides, the way of exploring the relational information inside the image can also be extended. In this paper, we continually explore the relationship between objects from both internal and external perspectives, and embed the vital image global information into the internal relationship module. To validate the effectiveness of our model, we conduct extensive experiments on the most popular MSCOCO dataset, and achieve state-of-the-art performance on both online and offline test sets.
引用
下载
收藏
页码:4403 / 4407
页数:5
相关论文
共 50 条
  • [21] Relational-Convergent Transformer for image captioning
    Chen, Lizhi
    Yang, You
    Hu, Juntao
    Pan, Longyue
    Zhai, Hao
    DISPLAYS, 2023, 77
  • [22] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [23] A Position-Aware Transformer for Image Captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
  • [24] Full-Memory Transformer for Image Captioning
    Lu, Tongwei
    Wang, Jiarong
    Min, Fen
    SYMMETRY-BASEL, 2023, 15 (01):
  • [25] Retrieval-Augmented Transformer for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
  • [26] A position-aware transformer for image captioning
    Deng, Zelin
    Zhou, Bo
    He, Pei
    Huang, Jianfeng
    Alfarraj, Osama
    Tolba, Amr
    Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
  • [27] Dual Position Relationship Transformer for Image Captioning
    Wang, Yaohan
    Qian, Wenhua
    Nie, Rencan
    Xu, Dan
    Cao, Jinde
    Kim, Pyoungwon
    BIG DATA, 2022, 10 (06) : 515 - 527
  • [28] SPT: Spatial Pyramid Transformer for Image Captioning
    Zhang, Haonan
    Zeng, Pengpeng
    Gao, Lianli
    Lyu, Xinyu
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4829 - 4842
  • [29] Position-guided transformer for image captioning
    Hu, Juntao
    Yang, You
    Yao, Lu
    An, Yongzhi
    Pan, Longyue
    IMAGE AND VISION COMPUTING, 2022, 128
  • [30] Dual Global Enhanced Transformer for image captioning
    Xian, Tiantao
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    NEURAL NETWORKS, 2022, 148 : 129 - 141