Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction

被引：0

作者：

El-gayar, M. M. ^{[1
,2
]}

机构：

[1] Mansoura Univ Mansoura, Fac Comp & Informat, Dept Informat Technol, Mansoura 35516, Egypt

[2] New Mansoura Univ, Fac Comp Sci & Engn, New Mansoura, Egypt

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 09期

关键词：

Semantic image captioning; deep visual attention model; long short-term memory; wavelet driven convolutional neural network;

D O I：

10.14569/IJACSA.2023.0140912

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

While modern systems for managing, retrieving, and analyzing images heavily rely on deriving semantic captions to categorize images, this task presents a considerable challenge due to the extensive capabilities required for manual processing, particularly with large images. Despite significant advancements in automatic image caption generation and human attention prediction through convolutional neural networks, there remains a need to enhance attention models in these networks through efficient multi-scale features utilization. Addressing this need, our study presents a novel image decoding model that integrates a wavelet-driven convolutional neural network with a dual-stage discrete wavelet transform, enabling the extraction of salient features within images. We utilize a wavelet-driven convolutional neural network as the encoder, coupled with a deep visual prediction model and Long Short-Term Memory as the decoder. The deep Visual Prediction Model calculates channel and location attention for visual attention features, with local features assessed by considering the spatial-contextual relationship among objects. Our primary contribution is to propose an encoder and decoder model to automatically create a semantic caption on the image based on the semantic contextual information and spatial features present in the image. Also, we improved the performance of this model, demonstrated through experiments conducted on three widely used datasets: Flickr8K, Flickr30K, and MSCOCO. The proposed approach outperformed current methods, achieving superior results in BLEU, METEOR, and GLEU scores. This research offers a significant advancement in image captioning and attention prediction models, presenting a promising direction for future work in this field.

引用

页码：105 / 114

页数：10

共 50 条

[1] Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction
Sasibhooshan, Reshmi
Kumaraswamy, Suresh
Sasidharan, Santhoshkumar
[J]. JOURNAL OF BIG DATA, 2023, 10 (01)
[2] Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction
Reshmi Sasibhooshan
Suresh Kumaraswamy
Santhoshkumar Sasidharan
[J]. Journal of Big Data, 10
[3] Automatic image caption generation using deep learning and multimodal attention
Dai, Jin
Zhang, Xinyu
[J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
[4] Automatic image caption generation using deep learning
Verma, Akash
Yadav, Arun Kumar
Kumar, Mohit
Yadav, Divakar
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
[5] GVA: guided visual attention approach for automatic image caption generation
Hossen, Md. Bipul
Ye, Zhongfu
Abdussalam, Amr
Hossain, Md. Imran
[J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
[6] GVA: guided visual attention approach for automatic image caption generation
Md. Bipul Hossen
Zhongfu Ye
Amr Abdussalam
Md. Imran Hossain
[J]. Multimedia Systems, 2024, 30
[7] Automatic image caption generation using deep learning
Akash Verma
Arun Kumar Yadav
Mohit Kumar
Divakar Yadav
[J]. Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
[8] Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
Cheng, Ling
Wei, Wei
Mao, Xianling
Liu, Yong
Miao, Chunyan
[J]. IEEE ACCESS, 2020, 8 : 154953 - 154965
[9] A Deep Attention based Framework for Image Caption Generation in Hindi Language
Dhir, Rijul
Mishra, Santosh Kumar
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 693 - 701
[10] Image caption based on Visual Attention Mechanism
Zhou, Jinfei
Zhu, Yaping
Pan, Hong
[J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32

← 1 2 3 4 5 →