Region-guided transformer for remote sensing image captioning

被引:0
|
作者
Zhao, Kai [1 ]
Xiong, Wei [2 ]
机构
[1] Space Engn Univ, Natl Key Lab Space Target Awareness, Beijing, Peoples R China
[2] Space Engn Univ, Sci & Technol Complex Elect Syst Simulat Lab, Beijing, Peoples R China
关键词
Image captioning; deep learning; semantic understanding; ATTENTION; NETWORK;
D O I
10.1080/17538947.2024.2400988
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Remote sensing image acquisition is an essential way to obtain information. However, research on remote sensing images mainly focuses on object detection or image classification. The emergence of remote sensing image captioning (RSIC) has enabled understanding and inference of remote sensing images, thus attracting considerable attention. There are still challenges in RSIC: the features used in RSIC are mostly based on grid features, and this form of features makes it difficult for the model to determine the main description targets. Hence, a more effective cross-modal matching method is needed for better text generation. Thus, we propose a region-guided transformer in response to the aforementioned issues. We extracted region features to enhance the ability of the model to focus on the main targets. To address the issue of information loss caused by region feature extraction, we proposed environment features to supplement background information. To improve the matching between text and image features, we propose a region-guided decoder that enhances the model's perception of different features through a weighted cross-attention mechanism. Meanwhile, we introduce region-guided information to guide the text-generation process. The effectiveness and superiority of our model have been demonstrated through extensive experiments.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [2] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    [J]. REMOTE SENSING, 2022, 14 (12)
  • [3] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [4] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [5] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [6] Exploring region features in remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 127
  • [7] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Meng, Ran
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [9] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [10] A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
    Sun, Dongwei
    Bao, Yajie
    Liu, Junmin
    Cao, Xiangyong
    [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17 : 18727 - 18738