Exploring region features in remote sensing image captioning

被引:3
|
作者
Zhao, Kai [1 ]
Xiong, Wei [2 ]
机构
[1] Space Engn Univ, Beijing 101400, Peoples R China
[2] Space Engn Univ, Sci & Technol Complex Elect Syst Simulat Lab, Beijing 101400, Peoples R China
关键词
Transformer model; Image processing; Model training; Remote sensing image captioning;
D O I
10.1016/j.jag.2024.103672
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Remote sensing image captioning (RSIC), an emerging field of cross -modal tasks, has become a popular research topic in recent years. Feature extraction underlies all RSIC tasks, with current tasks using grid features. Compared with grid features, region features provide object -level location -related information; however, these features have not been considered in the RSIC tasks. Therefore, this study examined the performance of region features on RSIC tasks. We generated region annotations based on published RSIC datasets to address the need for region -related datasets. We extracted region features according to the labeled data and proposed a Region Attention Transformer model. To solve the information loss problem owing to the region of interest pooling during region feature extraction, we proposed region -grid features and used geometry relationships for estimating correlations between different region features. We compared the performances of the models using grid and region features. The results showed that region features performed well in RSIC tasks, and region features forced the model to pay more attention to object regions when generating object -related words. This study describes a novel method of using features in RSIC tasks. Our region annotations are available at https://github.com/zk-1019/exploring.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [2] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [4] Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer
    Zhuang, Shuo
    Wang, Ping
    Wang, Gang
    Wang, Di
    Chen, Jinyong
    Gao, Feng
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [5] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543
  • [6] A Systematic Survey of Remote Sensing Image Captioning
    Zhao, Beigeng
    [J]. IEEE ACCESS, 2021, 9 : 154086 - 154111
  • [7] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
    Yuan, Zhenghang
    Li, Xuelong
    Wang, Qi
    [J]. IEEE ACCESS, 2020, 8 : 2608 - 2620
  • [8] Exploring better image captioning with grid features
    Jie Yan
    Yuxiang Xie
    Yanming Guo
    Yingmei Wei
    Xidao Luan
    [J]. Complex & Intelligent Systems, 2024, 10 : 3541 - 3556
  • [9] Exploring better image captioning with grid features
    Yan, Jie
    Xie, Yuxiang
    Guo, Yanming
    Wei, Yingmei
    Luan, Xidao
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3541 - 3556
  • [10] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200