Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction

被引：0

作者：

El-gayar, M. M. ^{[1
,2
]}

机构：

[1] Mansoura Univ Mansoura, Fac Comp & Informat, Dept Informat Technol, Mansoura 35516, Egypt

[2] New Mansoura Univ, Fac Comp Sci & Engn, New Mansoura, Egypt

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 09期

关键词：

Semantic image captioning; deep visual attention model; long short-term memory; wavelet driven convolutional neural network;

D O I：

10.14569/IJACSA.2023.0140912

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

While modern systems for managing, retrieving, and analyzing images heavily rely on deriving semantic captions to categorize images, this task presents a considerable challenge due to the extensive capabilities required for manual processing, particularly with large images. Despite significant advancements in automatic image caption generation and human attention prediction through convolutional neural networks, there remains a need to enhance attention models in these networks through efficient multi-scale features utilization. Addressing this need, our study presents a novel image decoding model that integrates a wavelet-driven convolutional neural network with a dual-stage discrete wavelet transform, enabling the extraction of salient features within images. We utilize a wavelet-driven convolutional neural network as the encoder, coupled with a deep visual prediction model and Long Short-Term Memory as the decoder. The deep Visual Prediction Model calculates channel and location attention for visual attention features, with local features assessed by considering the spatial-contextual relationship among objects. Our primary contribution is to propose an encoder and decoder model to automatically create a semantic caption on the image based on the semantic contextual information and spatial features present in the image. Also, we improved the performance of this model, demonstrated through experiments conducted on three widely used datasets: Flickr8K, Flickr30K, and MSCOCO. The proposed approach outperformed current methods, achieving superior results in BLEU, METEOR, and GLEU scores. This research offers a significant advancement in image captioning and attention prediction models, presenting a promising direction for future work in this field.

引用

下载

页码：105 / 114

页数：10

共 50 条

[11] VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation
He, Xinwei
Yang, Yang
Shi, Baoguang
Bai, Xiang
NEUROCOMPUTING, 2019, 328 : 48 - 55
[12] Image Caption Generation Using Attention Model
Ramalakshmi, Eliganti
Jain, Moksh Sailesh
Uddin, Mohammed Ameer
INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
[13] Cross-Lingual Image Caption Generation Based on Visual Attention Model
Wang, Bin
Wang, Cungang
Zhang, Qian
Su, Ying
Wang, Yang
Xu, Yanyan
IEEE ACCESS, 2020, 8 : 104543 - 104554
[14] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
Khademi, Mahmoud
Schulte, Oliver
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
[15] Image Caption Generation Using A Deep Architecture
Hani, Ansar
Tagougui, Najiba
Kherallah, Monji
2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
[16] Bahdanau Attention Based Bengali Image Caption Generation
Alam, Md Sahrial
Rahman, Md Sayedur
Hosen, Md Ikbal
Mubin, Khairul Anam
Hossen, Sharif
Mridha, M. F.
2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
[17] Image caption generation using a dual attention mechanism
Padate, Roshni
Jain, Amit
Kalla, Mukesh
Sharma, Arvind
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[18] Clothes image caption generation with attribute detection and visual attention model
Li, Xianrui
Ye, Zhiling
Zhang, Zhao
Zhao, Mingbo
PATTERN RECOGNITION LETTERS, 2021, 141 (141) : 68 - 74
[19] Chinese Image Caption Generation via Visual Attention and Topic Modeling
Liu, Maofu
Hu, Huijun
Li, Lingjun
Yu, Yan
Guan, Weili
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
[20] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Xu, Kelvin
Ba, Jimmy Lei
Kiros, Ryan
Cho, Kyunghyun
Courville, Aaron
Salakhutdinov, Ruslan
Zemel, Richard S.
Bengio, Yoshua
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057

← 1 2 3 4 5 →