Image Captioning with Text-Based Visual Attention

被引：15

作者：

He, Chen ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2019年 / 49卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Image captioning; Multimodal recurrent neural network; Text-based visual attention; Transposed weight sharing;

D O I：

10.1007/s11063-018-9807-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, many visual attention models lack of considering correlation between image and textual context, which may lead to attention vectors containing irrelevant annotation vectors. In order to overcome this limitation, we propose a new text-based visual attention (TBVA) model which focuses on certain salient object automatically by eliminating the irrelevant information once given previously generated text. The proposed end-to-end caption generation model adopts the architecture of multimodal recurrent neural network. We leverage the transposed weight sharing scheme to achieve better performance by reducing the number of parameters. The effectiveness of our model is validated on MS COCO and Flickr30k. The results show that TBVA outperforms the state-of-art image captioning methods.

引用

页码：177 / 185

页数：9

共 50 条

[41] Modeling visual and word-conditional semantic attention for image captioning
Wu, Chunlei
Wei, Yiwei
Chu, Xiaoliang
Su, Fei
Wang, Leiquan
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 67 : 100 - 107
[42] Exploring region relationships implicitly: Image captioning with visual relationship attention
Zhang, Zongjian
Wu, Qiang
Wang, Yang
Chen, Fang
[J]. IMAGE AND VISION COMPUTING, 2021, 109
[43] Separate and Locate: Rethink the Text in Text-based Visual Question Answering
Fang, Chengyang
Li, Jiangnan
Li, Liang
Ma, Can
Hu, Dayong
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4378 - 4388
[44] An ensemble model with attention based mechanism for image captioning
Al Badarneh, Israa
Hammo, Bassam H.
Al-Kadi, Omar
[J]. Computers and Electrical Engineering, 2025, 123
[45] A New Attention-Based LSTM for Image Captioning
Fen Xiao
Wenfeng Xue
Yanqing Shen
Xieping Gao
[J]. Neural Processing Letters, 2022, 54 : 3157 - 3171
[46] A Survey on Attention-Based Models for Image Captioning
Osman, Asmaa A. E.
Shalaby, Mohamed A. Wahby
Soliman, Mona M.
Elsayed, Khaled M.
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 403 - 412
[47] A New Attention-Based LSTM for Image Captioning
Xiao, Fen
Xue, Wenfeng
Shen, Yanqing
Gao, Xieping
[J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3157 - 3171
[48] AttResNet: Attention-based ResNet for Image Captioning
Feng, Yunmeng
Lan, Long
Zhang, Xiang
Xu, Chuanfu
Wang, Zhenghua
Luo, Zhigang
[J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
[49] Research on Image Captioning Based on Double Attention Model
Zhuo Y.-Q.
Wei J.-H.
Li Z.-X.
[J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (05): : 1123 - 1130
[50] IMAGE CAPTIONING MODEL BASED ON MULTIPLE ATTENTION PATTERNS
Zhang, Tao
Zhang, Tingting
Ma, Feng
[J]. JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2024, 25 (01) : 191 - 206

← 1 2 3 4 5 →