Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning

被引:17
|
作者
Zhang, Xiangrong [1 ]
Li, Yunpeng [1 ]
Wang, Xin [1 ]
Liu, Feixiang [1 ]
Wu, Zhaoji [1 ]
Cheng, Xina [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing image captioning; cross-modal interaction; attention mechanism; semantic information; encoder-decoder; TRANSFORMER; NETWORK;
D O I
10.3390/rs15030579
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image captioning (RSIC) is to describe a given remote sensing image (RSI) using coherent sentences. Most existing attention-based methods model the coherence through an LSTM-based decoder, which dynamically infers a word vector from preceding sentences. However, these methods are indirectly guided through the confusion of attentive regions, as (1) the weighted average in the attention mechanism distracts the word vector from capturing pertinent visual regions and (2) there are few constraints or rewards for learning long-range transitions. In this paper, we propose a multi-source interactive stair attention mechanism that separately models the semantics of preceding sentences and visual regions of interest. Specifically, the multi-source interaction takes previous semantic vectors as queries and applies an attention mechanism on regional features to acquire the next word vector, which reduces immediate hesitation by considering linguistics. The stair attention divides the attentive weights into three levels-that is, the core region, the surrounding region, and other regions-and all regions in the search scope are focused on differently. Then, a CIDEr-based reward reinforcement learning is devised, in order to enhance the quality of the generated sentences. Comprehensive experiments on widely used benchmarks (i.e., the Sydney-Captions, UCM-Captions, and RSICD data sets) demonstrate the superiority of the proposed model over state-of-the-art models, in terms of its coherence, while maintaining high accuracy.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Development status and future prospects of multi-source remote sensing image fusion
    Li S.
    Li C.
    Kang X.
    National Remote Sensing Bulletin, 2021, 25 (01) : 148 - 166
  • [22] A NOVEL ROBUST FEATURE DESCRIPTOR FOR MULTI-SOURCE REMOTE SENSING IMAGE REGISTRATION
    Cui, Song
    Zhong, Yanfei
    Ma, Ailong
    Zhang, Liangpei
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 919 - 922
  • [23] VAA: Visual Aligning Attention Model for Remote Sensing Image Captioning
    Zhang, Zhengyuan
    Zhang, Wenkai
    Diao, Wenhui
    Yan, Menglong
    Ga, Xin
    Sun, Xian
    IEEE ACCESS, 2019, 7 : 137355 - 137364
  • [24] Multi-source Remote Sensing Image Deep Feature Fusion Matching Algorithm
    Wang L.
    Lan C.
    Yao F.
    Hou H.
    Wu B.
    Journal of Geo-Information Science, 2023, 25 (02) : 380 - 395
  • [25] Multi-Source Remote Sensing Image Fusion for Ship Target Detection and Recognition
    Liu, Jinming
    Chen, Hao
    Wang, Yu
    REMOTE SENSING, 2021, 13 (23)
  • [26] Remote Sensing Image Captioning With Sequential Attention and Flexible Word Correlation
    Wang, Jie
    Wang, Binze
    Xi, Jiangbo
    Bai, Xue
    Ersoy, Okan K.
    Cong, Ming
    Gao, Siyan
    Zhao, Zhe
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [27] Multi-source and multi-temporal remote sensing image classification for flood disaster monitoring
    LI Zhu
    JIA Zhenyang
    DONG Jing
    LIU Zhenghong
    Global Geology, 2025, 28 (01) : 48 - 57
  • [28] Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization
    Tuia, Devis
    Marcos, Diego
    Camps-Valls, Gustau
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 120 : 1 - 12
  • [29] Research on multi-source remote sensing image registration technology based on Baker mapping
    Ma, Li
    Huang, Lei
    INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2024, 15 (03) : 293 - 309
  • [30] A Multi-source Remote Sensing Image Matching Method Using Directional Phase Feature
    Li X.
    Yang Y.
    Yang B.
    Yin F.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2020, 45 (04): : 488 - 494