Hierarchical Attention Network for Image Captioning

被引：0

作者：

Wang, Weixuan ^{[1
]}

Chen, Zhihong ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou 510275, Guangdong, Peoples R China

来源：

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy's test split.

引用

页码：8957 / 8964

页数：8

共 50 条

[1] Gated Hierarchical Attention for Image Captioning
Wang, Qingzhong
Chan, Antoni B.
[J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 21 - 37
[2] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
Cheng, Yong
Huang, Fei
Zhou, Lian
Jin, Cheng
Zhang, Yuejie
Zhang, Tao
[J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
[3] Hybrid attention network for image captioning
Jiang, Wenhui
Li, Qin
Zhan, Kun
Fang, Yuming
Shen, Fei
[J]. DISPLAYS, 2022, 73
[4] Multivariate Attention Network for Image Captioning
Wang, Weixuan
Chen, Zhihong
Hu, Haifeng
[J]. COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 587 - 602
[5] Hierarchical Deep Neural Network for Image Captioning
Su, Yuting
Li, Yuqian
Xu, Ning
Liu, An-An
[J]. NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1057 - 1067
[6] Hierarchical Deep Neural Network for Image Captioning
Yuting Su
Yuqian Li
Ning Xu
An-An Liu
[J]. Neural Processing Letters, 2020, 52 : 1057 - 1067
[7] A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING
Sow, Daouda
Qin, Zengchang
Niasse, Mouhamed
Wan, Tao
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3802 - 3806
[8] Attention on Attention for Image Captioning
Huang, Lun
Wang, Wenmin
Chen, Jie
Wei, Xiao-Yong
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
[9] Multimodal-enhanced hierarchical attention network for video captioning
Zhong, Maosheng
Chen, Youde
Zhang, Hao
Xiong, Hao
Wang, Zhixiang
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
[10] Multimodal-enhanced hierarchical attention network for video captioning
Maosheng Zhong
Youde Chen
Hao Zhang
Hao Xiong
Zhixiang Wang
[J]. Multimedia Systems, 2023, 29 : 2469 - 2482

← 1 2 3 4 5 →