Exploring region features in remote sensing image captioning

被引:3
|
作者
Zhao, Kai [1 ]
Xiong, Wei [2 ]
机构
[1] Space Engn Univ, Beijing 101400, Peoples R China
[2] Space Engn Univ, Sci & Technol Complex Elect Syst Simulat Lab, Beijing 101400, Peoples R China
关键词
Transformer model; Image processing; Model training; Remote sensing image captioning;
D O I
10.1016/j.jag.2024.103672
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Remote sensing image captioning (RSIC), an emerging field of cross -modal tasks, has become a popular research topic in recent years. Feature extraction underlies all RSIC tasks, with current tasks using grid features. Compared with grid features, region features provide object -level location -related information; however, these features have not been considered in the RSIC tasks. Therefore, this study examined the performance of region features on RSIC tasks. We generated region annotations based on published RSIC datasets to address the need for region -related datasets. We extracted region features according to the labeled data and proposed a Region Attention Transformer model. To solve the information loss problem owing to the region of interest pooling during region feature extraction, we proposed region -grid features and used geometry relationships for estimating correlations between different region features. We compared the performances of the models using grid and region features. The results showed that region features performed well in RSIC tasks, and region features forced the model to pay more attention to object regions when generating object -related words. This study describes a novel method of using features in RSIC tasks. Our region annotations are available at https://github.com/zk-1019/exploring.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] MULTI-SCALE CROPPING MECHANISM FOR REMOTE SENSING IMAGE CAPTIONING
    Zhang, Xueting
    Wang, Qi
    Chen, Shangdong
    Li, Xuelong
    [J]. 2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 10039 - 10042
  • [42] A Novel SVM-Based Decoder for Remote Sensing Image Captioning
    Hoxha, Genc
    Melgani, Farid
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Exploring region relationships implicitly: Image captioning with visual relationship attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. IMAGE AND VISION COMPUTING, 2021, 109
  • [44] Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning
    Zhang, Xiangrong
    Li, Yunpeng
    Wang, Xin
    Liu, Feixiang
    Wu, Zhaoji
    Cheng, Xina
    Jiao, Licheng
    [J]. REMOTE SENSING, 2023, 15 (03)
  • [45] High-Resolution Remote Sensing Image Captioning Based on Structured Attention
    Zhao, Rui
    Shi, Zhenwei
    Zou, Zhengxia
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [46] Multi-label semantic feature fusion for remote sensing image captioning
    Wang, Shuang
    Ye, Xiutiao
    Gu, Yu
    Wang, Jihui
    Meng, Yun
    Tian, Jingxian
    Hou, Biao
    Jiao, Licheng
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
  • [47] A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Zhang, Tianyang
    Wang, Guanchun
    Wang, Xinlin
    Li, Shuo
    [J]. Remote Sensing, 2024, 16 (21)
  • [48] Self-Learning for Few-Shot Remote Sensing Image Captioning
    Zhou, Haonan
    Du, Xiaoping
    Xia, Lurui
    Li, Sen
    [J]. REMOTE SENSING, 2022, 14 (18)
  • [49] A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning
    Chavhan, Ruchika
    Banerjee, Biplab
    Zhu, Xiao Xiang
    Chaudhuri, Subhasis
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4918 - 4925
  • [50] TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning
    Guo, Jie
    Li, Ze
    Song, Bin
    Chi, Yuhao
    [J]. REMOTE SENSING, 2024, 16 (11)