Hierarchical Attention Network for Image Captioning

被引:0
|
作者
Wang, Weixuan [1 ]
Chen, Zhihong [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Guangdong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy's test split.
引用
收藏
页码:8957 / 8964
页数:8
相关论文
共 50 条
  • [1] Gated Hierarchical Attention for Image Captioning
    Wang, Qingzhong
    Chan, Antoni B.
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 21 - 37
  • [2] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
    Cheng, Yong
    Huang, Fei
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
  • [3] Hybrid attention network for image captioning
    Jiang, Wenhui
    Li, Qin
    Zhan, Kun
    Fang, Yuming
    Shen, Fei
    [J]. DISPLAYS, 2022, 73
  • [4] Multivariate Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 587 - 602
  • [5] Hierarchical Deep Neural Network for Image Captioning
    Su, Yuting
    Li, Yuqian
    Xu, Ning
    Liu, An-An
    [J]. NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1057 - 1067
  • [6] Hierarchical Deep Neural Network for Image Captioning
    Yuting Su
    Yuqian Li
    Ning Xu
    An-An Liu
    [J]. Neural Processing Letters, 2020, 52 : 1057 - 1067
  • [7] A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING
    Sow, Daouda
    Qin, Zengchang
    Niasse, Mouhamed
    Wan, Tao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3802 - 3806
  • [8] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [9] Multimodal-enhanced hierarchical attention network for video captioning
    Zhong, Maosheng
    Chen, Youde
    Zhang, Hao
    Xiong, Hao
    Wang, Zhixiang
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
  • [10] Multimodal-enhanced hierarchical attention network for video captioning
    Maosheng Zhong
    Youde Chen
    Hao Zhang
    Hao Xiong
    Zhixiang Wang
    [J]. Multimedia Systems, 2023, 29 : 2469 - 2482