Infrared Image Caption Based on Object-Oriented Attention

被引：2

作者：

Lv, Junfeng ^{[1
]}

Hui, Tian ^{[1
]}

Zhi, Yongfeng ^{[1
]}

Xu, Yuelei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Inst Unmanned Syst Res, Xian 710072, Peoples R China

来源：

ENTROPY | 2023年 / 25卷 / 05期

关键词：

infrared image caption; domain transfer object detection; adaptive weighting module; object oriented attention;

D O I：

10.3390/e25050826

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

With the ongoing development of image technology, the deployment of various intelligent applications on embedded devices has attracted increased attention in the industry. One such application is automatic image captioning for infrared images, which involves converting images into text. This practical task is widely used in night security, as well as for understanding night scenes and other scenarios. However, due to the differences in image features and the complexity of semantic information, generating captions for infrared images remains a challenging task. From the perspective of deployment and application, to improve the correlation between descriptions and objects, we introduced the YOLOv6 and LSTM as encoder-decoder structure and proposed infrared image caption based on object-oriented attention. Firstly, to improve the domain adaptability of the detector, we optimized the pseudo-label learning process. Secondly, we proposed the object-oriented attention method to address the alignment problem between complex semantic information and embedded words. This method helps select the most crucial features of the object region and guides the caption model in generating words that are more relevant to the object. Our methods have shown good performance on the infrared image and can produce words explicitly associated with the object regions located by the detector. The robustness and effectiveness of the proposed methods were demonstrated through evaluation on various datasets, along with other state-of-the-art methods. Our approach achieved BLUE-4 scores of 31.6 and 41.2 on KAIST and Infrared City and Town datasets, respectively. Our approach provides a feasible solution for the deployment of embedded devices in industrial applications.

引用

页数：18

共 50 条

[1] An image fusion method based on object-oriented image classification
Chen, YH
Fung, T
Lin, WJ
Wang, JF
IGARSS 2005: IEEE International Geoscience and Remote Sensing Symposium, Vols 1-8, Proceedings, 2005, : 3924 - 3927
[2] An image fusion method based on object-oriented classification
Jing, Linhai
Cheng, Qiuming
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2012, 33 (08) : 2434 - 2450
[3] PoISAR image classification based on object-oriented technology
Xiao Yan
Wang Bin
JOURNAL OF INFRARED AND MILLIMETER WAVES, 2020, 39 (04) : 505 - 512
[4] Object-oriented building extraction based on visual attention mechanism
Shen, Xiaole
Yu, Chen
Lin, Lin
Cao, Jinzhou
PEERJ COMPUTER SCIENCE, 2023, 9
[5] Progressing from object-based to object-oriented image analysis
Baatz, M.
Hoffmann, C.
Willhauck, G.
Lecture Notes in Geoinformation and Cartography, 2008, 0 (9783540770572): : 29 - 42
[6] Object-Oriented Image Database model
Grosky, WI
Stanchev, PL
COMPUTERS AND THEIR APPLICATIONS, 2001, : 94 - 97
[7] Object-oriented knowledge-based system for image diagnosis
Chan, Samuel W.K.
Leung, K.S.
Wong, W.S.Felix
1996, Taylor & Francis Ltd, London, United Kingdom (10)
[8] Object-oriented knowledge-based system for image diagnosis
Chan, SWK
Leung, KS
Wong, WSF
APPLIED ARTIFICIAL INTELLIGENCE, 1996, 10 (05) : 407 - 438
[9] Meaningful image objects for object-oriented image analysis
Lizarazo, I.
REMOTE SENSING LETTERS, 2013, 4 (05) : 419 - 426
[10] An image caption method based on object detection
Cao, Danyang
Zhu, Menggui
Gao, Lei
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (24) : 35329 - 35350

← 1 2 3 4 5 →