Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction

被引:0
|
作者
El-gayar, M. M. [1 ,2 ]
机构
[1] Mansoura Univ Mansoura, Fac Comp & Informat, Dept Informat Technol, Mansoura 35516, Egypt
[2] New Mansoura Univ, Fac Comp Sci & Engn, New Mansoura, Egypt
关键词
Semantic image captioning; deep visual attention model; long short-term memory; wavelet driven convolutional neural network;
D O I
10.14569/IJACSA.2023.0140912
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
While modern systems for managing, retrieving, and analyzing images heavily rely on deriving semantic captions to categorize images, this task presents a considerable challenge due to the extensive capabilities required for manual processing, particularly with large images. Despite significant advancements in automatic image caption generation and human attention prediction through convolutional neural networks, there remains a need to enhance attention models in these networks through efficient multi-scale features utilization. Addressing this need, our study presents a novel image decoding model that integrates a wavelet-driven convolutional neural network with a dual-stage discrete wavelet transform, enabling the extraction of salient features within images. We utilize a wavelet-driven convolutional neural network as the encoder, coupled with a deep visual prediction model and Long Short-Term Memory as the decoder. The deep Visual Prediction Model calculates channel and location attention for visual attention features, with local features assessed by considering the spatial-contextual relationship among objects. Our primary contribution is to propose an encoder and decoder model to automatically create a semantic caption on the image based on the semantic contextual information and spatial features present in the image. Also, we improved the performance of this model, demonstrated through experiments conducted on three widely used datasets: Flickr8K, Flickr30K, and MSCOCO. The proposed approach outperformed current methods, achieving superior results in BLEU, METEOR, and GLEU scores. This research offers a significant advancement in image captioning and attention prediction models, presenting a promising direction for future work in this field.
引用
下载
收藏
页码:105 / 114
页数:10
相关论文
共 50 条
  • [11] VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation
    He, Xinwei
    Yang, Yang
    Shi, Baoguang
    Bai, Xiang
    NEUROCOMPUTING, 2019, 328 : 48 - 55
  • [12] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [13] Cross-Lingual Image Caption Generation Based on Visual Attention Model
    Wang, Bin
    Wang, Cungang
    Zhang, Qian
    Su, Ying
    Wang, Yang
    Xu, Yanyan
    IEEE ACCESS, 2020, 8 : 104543 - 104554
  • [14] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
    Khademi, Mahmoud
    Schulte, Oliver
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
  • [15] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [16] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [17] Image caption generation using a dual attention mechanism
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [18] Clothes image caption generation with attribute detection and visual attention model
    Li, Xianrui
    Ye, Zhiling
    Zhang, Zhao
    Zhao, Mingbo
    PATTERN RECOGNITION LETTERS, 2021, 141 (141) : 68 - 74
  • [19] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [20] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    Xu, Kelvin
    Ba, Jimmy Lei
    Kiros, Ryan
    Cho, Kyunghyun
    Courville, Aaron
    Salakhutdinov, Ruslan
    Zemel, Richard S.
    Bengio, Yoshua
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057