A novel approach for image retrieval in remote sensing using vision-language-based image caption generation

被引：0

作者：

Prem Shanker Yadav ^{[1
]}

Dinesh Kumar Tyagi ^{[1
]}

Santosh Kumar Vipparthi ^{[2
]}

机构：

[1] Malaviya National Institute of Technology,Department of Computer Science and Engineering

[2] Indian Institute of Technology,School of Artificial Intelligence and Data Engineering

来源：

Multimedia Tools and Applications | 2025年 / 84卷 / 6期

关键词：

Image caption generation; Image retrieval; Remote sensing big data; Vision language pre-training model; TF-IDF;

D O I：

10.1007/s11042-024-20447-w

中图分类号：

学科分类号：

摘要：

Recent advancements in satellite technologies have resulted in the emergence of Remote Sensing (RS) images. Hence, the primary imperative research domain is designing a precise retrieval model for retrieving the most pertinent images based on the query. Present Remote Sensing Image Retrieval (RSIR) systems use visual descriptors to characterize the primitives (such as various land-cover types) that are visible in the images. However, the visual descriptors are inadequate for defining the complicated content of RS images. To solve this problem, a new model is devised for image retrieval based on image captions. The goal is to generate textual illustrations with captions to define relations amongst objects precisely. Here, image captioning is attained based on the vision-language pre-training model. The image captions are utilized for generating features like term frequency-inverse document frequency (TF-IDF), length of text, and Bag of Words. Meanwhile, query text is utilized wherein features like TF-IDF, text length, and Bag of Words are obtained. The similarity between query text features and the image captions features has been computed on the basis of a hybrid similarity measure wherein weights are tuned with the proposed Honey Badger Political Optimizer (HBPO) to retrieve the image. The proposed HBPO provided enhanced efficiency with elevated precision of 93.3%, recall of 93.7%, F1-score of 93.5%, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) of 0.441.

引用

页码：2985 / 3014

页数：29

共 50 条

[1] Scene Attention Mechanism for Remote Sensing Image Caption Generation
Wu, Shiqi
Zhang, Xiangrong
Wang, Xin
Li, Chen
Jiao, Licheng
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[2] Exploring Models and Data for Remote Sensing Image Caption Generation
Lu, Xiaoqiang
Wang, Binqiang
Zheng, Xiangtao
Li, Xuelong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04): : 2183 - 2195
[3] Image caption generation via improved vision-language pre-training model: perception towards image retrieval
Padate, Roshni
Gupta, Ashutosh
Kalla, Mukesh
Sharma, Arvind
IMAGING SCIENCE JOURNAL, 2025,
[4] Remote sensing image caption generation via transformer and reinforcement learning
Shen, Xiangqing
Liu, Bing
Zhou, Yong
Zhao, Jiaqi
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
[5] Remote sensing image caption generation via transformer and reinforcement learning
Xiangqing Shen
Bing Liu
Yong Zhou
Jiaqi Zhao
Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
[6] Automatic Image Caption Generation Using ResNet & Torch Vision
Verma, Vijeta
Saritha, Sri Khetwat
Jain, Sweta
MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
[7] Prior-Experience-Based Vision-Language Model for Remote Sensing Image-Text Retrieval
Tang, Xu
Huang, Dabiao
Ma, Jingjing
Zhang, Xiangrong
Liu, Fang
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[8] A novel retrieval method for remote sensing image based on statistical model
Zhiqiang Liu
Ligu Zhu
Multimedia Tools and Applications, 2018, 77 : 24643 - 24662
[9] A novel retrieval method for remote sensing image based on statistical model
Liu, Zhiqiang
Zhu, Ligu
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (19) : 24643 - 24662
[10] Image Caption Generation via Unified Retrieval and Generation-Based Method
Zhao, Shanshan
Li, Lixiang
Peng, Haipeng
Yang, Zihang
Zhang, Jiaxuan
APPLIED SCIENCES-BASEL, 2020, 10 (18):

← 1 2 3 4 5 →