Image Captioning with Text-Based Visual Attention

被引:15
|
作者
He, Chen [1 ]
Hu, Haifeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Multimodal recurrent neural network; Text-based visual attention; Transposed weight sharing;
D O I
10.1007/s11063-018-9807-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, many visual attention models lack of considering correlation between image and textual context, which may lead to attention vectors containing irrelevant annotation vectors. In order to overcome this limitation, we propose a new text-based visual attention (TBVA) model which focuses on certain salient object automatically by eliminating the irrelevant information once given previously generated text. The proposed end-to-end caption generation model adopts the architecture of multimodal recurrent neural network. We leverage the transposed weight sharing scheme to achieve better performance by reducing the number of parameters. The effectiveness of our model is validated on MS COCO and Flickr30k. The results show that TBVA outperforms the state-of-art image captioning methods.
引用
收藏
页码:177 / 185
页数:9
相关论文
共 50 条
  • [1] Image Captioning with Text-Based Visual Attention
    Chen He
    Haifeng Hu
    [J]. Neural Processing Letters, 2019, 49 : 177 - 185
  • [2] Switching Text-Based Image Encoders for Captioning Images With Text
    Ueda, Arisa
    Yang, Wei
    Sugiura, Komei
    [J]. IEEE ACCESS, 2023, 11 : 55706 - 55715
  • [3] Image Captioning Based on Visual and Semantic Attention
    Wei, Haiyang
    Li, Zhixin
    Zhang, Canlong
    [J]. MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 151 - 162
  • [4] EAES: Effective Augmented Embedding Spaces for Text-Based Image Captioning
    Khang Nguyen
    Bui, Doanh C.
    Truc Trinh
    Vo, Nguyen D.
    [J]. IEEE ACCESS, 2022, 10 : 32443 - 32452
  • [5] Towards Accurate Text-based Image Captioning with Content Diversity Exploration
    Xu, Guanghui
    Niu, Shuaicheng
    Tan, Mingkui
    Luo, Yucheng
    Du, Qing
    Wu, Qi
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12632 - 12641
  • [6] Learning by Imagination: A Joint Framework for Text-Based Image Manipulation and Change Captioning
    Ak, Kenan E. E.
    Sun, Ying
    Lim, Joo Hwee
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3006 - 3016
  • [7] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Bengali Image Captioning with Visual Attention
    Ami, Amit Saha
    Humaira, Mayeesha
    Jim, Md Abidur Rahman Khan
    Paul, Shimul
    Shah, Faisal Muhammad
    [J]. 2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [9] A Visual Attention-Based Model for Bengali Image Captioning
    Das B.
    Pal R.
    Majumder M.
    Phadikar S.
    Sekh A.A.
    [J]. SN Computer Science, 4 (2)
  • [10] Zero-TextCap: Zero-shot Framework for Text-based Image Captioning
    Xu, Dongsheng
    Zhao, Wenye
    Cai, Yi
    Huang, Qingbao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4949 - 4957