Image Captioning with Text-Based Visual Attention

被引:15
|
作者
He, Chen [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Multimodal recurrent neural network; Text-based visual attention; Transposed weight sharing;
D O I
10.1007/s11063-018-9807-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, many visual attention models lack of considering correlation between image and textual context, which may lead to attention vectors containing irrelevant annotation vectors. In order to overcome this limitation, we propose a new text-based visual attention (TBVA) model which focuses on certain salient object automatically by eliminating the irrelevant information once given previously generated text. The proposed end-to-end caption generation model adopts the architecture of multimodal recurrent neural network. We leverage the transposed weight sharing scheme to achieve better performance by reducing the number of parameters. The effectiveness of our model is validated on MS COCO and Flickr30k. The results show that TBVA outperforms the state-of-art image captioning methods.
引用
收藏
页码:177 / 185
页数:9
相关论文
共 50 条
  • [41] Modeling visual and word-conditional semantic attention for image captioning
    Wu, Chunlei
    Wei, Yiwei
    Chu, Xiaoliang
    Su, Fei
    Wang, Leiquan
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 67 : 100 - 107
  • [42] Exploring region relationships implicitly: Image captioning with visual relationship attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. IMAGE AND VISION COMPUTING, 2021, 109
  • [43] Separate and Locate: Rethink the Text in Text-based Visual Question Answering
    Fang, Chengyang
    Li, Jiangnan
    Li, Liang
    Ma, Can
    Hu, Dayong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4378 - 4388
  • [44] An ensemble model with attention based mechanism for image captioning
    Al Badarneh, Israa
    Hammo, Bassam H.
    Al-Kadi, Omar
    [J]. Computers and Electrical Engineering, 2025, 123
  • [45] A New Attention-Based LSTM for Image Captioning
    Fen Xiao
    Wenfeng Xue
    Yanqing Shen
    Xieping Gao
    [J]. Neural Processing Letters, 2022, 54 : 3157 - 3171
  • [46] A Survey on Attention-Based Models for Image Captioning
    Osman, Asmaa A. E.
    Shalaby, Mohamed A. Wahby
    Soliman, Mona M.
    Elsayed, Khaled M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 403 - 412
  • [47] A New Attention-Based LSTM for Image Captioning
    Xiao, Fen
    Xue, Wenfeng
    Shen, Yanqing
    Gao, Xieping
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3157 - 3171
  • [48] AttResNet: Attention-based ResNet for Image Captioning
    Feng, Yunmeng
    Lan, Long
    Zhang, Xiang
    Xu, Chuanfu
    Wang, Zhenghua
    Luo, Zhigang
    [J]. 2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [49] Research on Image Captioning Based on Double Attention Model
    Zhuo Y.-Q.
    Wei J.-H.
    Li Z.-X.
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (05): : 1123 - 1130
  • [50] IMAGE CAPTIONING MODEL BASED ON MULTIPLE ATTENTION PATTERNS
    Zhang, Tao
    Zhang, Tingting
    Ma, Feng
    [J]. JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2024, 25 (01) : 191 - 206