Multivariate Attention Network for Image Captioning

被引:1
|
作者
Wang, Weixuan [1 ]
Chen, Zhihong [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510006, Guangdong, Peoples R China
来源
关键词
Image captioning; Attention mechanism; Multimodal combination;
D O I
10.1007/978-3-030-20876-9_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attention mechanism has been used extensively in computer vision to deeper understand image through selectively local analysis. However, the existing methods apply attention mechanism individually, which leads to irrelevant or inaccurate words. To solve this problem, we propose a Multivariate Attention Network (MAN) for image captioning, which contains a content attention for identifying content information of objects, a position attention for locating positions of important patches, and a minutia attention for preserving fine-grained information of target objects. Furthermore, we also construct a Multivariate Residual Network (MRN) to integrate the more discriminative multimodal representation via modeling the projections and extracting relevant relations among visual information of different modalities. Our MAN is inspired by the latest achievements in neuroscience, and designed to mimic the treatment of visual information on human brain. Compared with previous methods, we apply diverse visual information and exploit several multimodal integration strategies, which can significantly improve the performance of our model. The experimental results show that our MAN model outperforms the state-of-the-art approaches on two benchmark datasets MS-COCO and Flickr30K.
引用
收藏
页码:587 / 602
页数:16
相关论文
共 50 条
  • [1] Hierarchical Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
  • [2] Hybrid attention network for image captioning
    Jiang, Wenhui
    Li, Qin
    Zhan, Kun
    Fang, Yuming
    Shen, Fei
    [J]. DISPLAYS, 2022, 73
  • [3] A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING
    Sow, Daouda
    Qin, Zengchang
    Niasse, Mouhamed
    Wan, Tao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3802 - 3806
  • [4] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [5] Image captioning using DenseNet network and adaptive attention
    Deng, Zhenrong
    Jiang, Zhouqin
    Lan, Rushi
    Huang, Wenming
    Luo, Xiaonan
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 85
  • [6] Multi-Keys Attention Network for Image Captioning
    Yang, Ziqian
    Li, Hui
    Ouyang, Renrong
    Zhang, Quan
    Xiao, Jimin
    [J]. COGNITIVE COMPUTATION, 2024, 16 (03) : 1061 - 1072
  • [7] Learning joint relationship attention network for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [8] Collaborative strategy network for spatial attention image captioning
    Zhou, Dongming
    Yang, Jing
    Bao, Riqiang
    [J]. APPLIED INTELLIGENCE, 2022, 52 (08) : 9017 - 9032
  • [9] Multi-Gate Attention Network for Image Captioning
    Jiang, Weitao
    Li, Xiying
    Hu, Haifeng
    Lu, Qiang
    Liu, Bohong
    [J]. IEEE ACCESS, 2021, 9 : 69700 - 69709
  • [10] Collaborative strategy network for spatial attention image captioning
    Dongming Zhou
    Jing Yang
    Riqiang Bao
    [J]. Applied Intelligence, 2022, 52 : 9017 - 9032