Automatic image caption generation using deep learning and multimodal attention

被引:3
|
作者
Dai, Jin [1 ,2 ,3 ]
Zhang, Xinyu [1 ,2 ,3 ]
机构
[1] Shanghai Key Lab Trustworthy Comp, Shanghai, Peoples R China
[2] Engn Res Ctr Software Hardware Codesign Technol &, Shanghai, Peoples R China
[3] East China Normal Univ, Sch Software Engn, Shanghai, Peoples R China
关键词
attention mechanism; CBAM; deep learning; image captioning;
D O I
10.1002/cav.2072
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present an improved image caption generation model that incorporating multimodal attention mechanism. We use ResNet-101 to extract image features while incorporating channel attention mechanism and spatial attention mechanism. We use Faster R-CNN for object detection and use a multi-head attention structure consisting of spatial attention and self-attention. This allows our algorithm to improve the model's capability to learn and use the internal grammatical features of natural sentences. Moreover, we use GPU parallel computing to accelerate the entire model training. We apply our model and algorithm to early education scenarios: show and tell for kids. We compare our algorithm with the state-of-the-art deep learning algorithms. Our experimental results show that our model improves the captioning accuracy in terms of standard automatic evaluation metrics.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [2] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    [J]. Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [3] Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction
    El-gayar, M. M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 105 - 114
  • [4] Image Caption Generation using Deep Learning Technique
    Amritkar, Chetan
    Jabade, Vaishali
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [5] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [6] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [7] Image Caption Generation using Deep Learning For Video Summarization Applications
    Inayathulla, Mohammed
    Karthikeyan, C.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 565 - 572
  • [8] Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap
    Amirian, Soheyla
    Rasheed, Khaled
    Taha, Thiab R.
    Arabnia, Hamid R.
    [J]. IEEE ACCESS, 2020, 8 : 218386 - 218400
  • [9] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    [J]. 2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [10] Image caption generation using a dual attention mechanism
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123