Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning

被引:17
|
作者
Zhang, Xiangrong [1 ]
Li, Yunpeng [1 ]
Wang, Xin [1 ]
Liu, Feixiang [1 ]
Wu, Zhaoji [1 ]
Cheng, Xina [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing image captioning; cross-modal interaction; attention mechanism; semantic information; encoder-decoder; TRANSFORMER; NETWORK;
D O I
10.3390/rs15030579
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The aim of remote sensing image captioning (RSIC) is to describe a given remote sensing image (RSI) using coherent sentences. Most existing attention-based methods model the coherence through an LSTM-based decoder, which dynamically infers a word vector from preceding sentences. However, these methods are indirectly guided through the confusion of attentive regions, as (1) the weighted average in the attention mechanism distracts the word vector from capturing pertinent visual regions and (2) there are few constraints or rewards for learning long-range transitions. In this paper, we propose a multi-source interactive stair attention mechanism that separately models the semantics of preceding sentences and visual regions of interest. Specifically, the multi-source interaction takes previous semantic vectors as queries and applies an attention mechanism on regional features to acquire the next word vector, which reduces immediate hesitation by considering linguistics. The stair attention divides the attentive weights into three levels-that is, the core region, the surrounding region, and other regions-and all regions in the search scope are focused on differently. Then, a CIDEr-based reward reinforcement learning is devised, in order to enhance the quality of the generated sentences. Comprehensive experiments on widely used benchmarks (i.e., the Sydney-Captions, UCM-Captions, and RSICD data sets) demonstrate the superiority of the proposed model over state-of-the-art models, in terms of its coherence, while maintaining high accuracy.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Multi-source remote sensing image bidirectional consistent registration based on learning feature
    Zhang Y.
    Ma G.
    Zi S.
    Men H.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (11): : 1906 - 1916
  • [32] Multi-Source High Resolution Remote Sensing Image Fusion based on Intelligent Decision
    Zhang, Wanfeng
    Li, Shengyang
    Hao, Zhongweng
    Yang, Song
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 370 - 375
  • [33] Mallat fusion for multi-source remote sensing classification
    Cao, Dongdong
    Yin, Qian
    Guo, Ping
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, 2006, : 588 - 593
  • [34] High-Resolution Remote Sensing Image Captioning Based on Structured Attention
    Zhao, Rui
    Shi, Zhenwei
    Zou, Zhengxia
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [35] Multi-Source Remote Sensing Images Semantic Segmentation Based on Differential Feature Attention Fusion
    Zhang, Di
    Yue, Peicheng
    Yan, Yuhang
    Niu, Qianqian
    Zhao, Jiaqi
    Ma, Huifang
    Remote Sensing, 2024, 16 (24)
  • [36] MULTI-SCALE CROPPING MECHANISM FOR REMOTE SENSING IMAGE CAPTIONING
    Zhang, Xueting
    Wang, Qi
    Chen, Shangdong
    Li, Xuelong
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 10039 - 10042
  • [37] The Multi-source Image NDVI Interactive Corrective Method for Long-term Remote Sensing Monitoring of Rare Earth Mining Area
    Li, Hengkai
    Li, Qin
    Wang, Lijuan
    Lei, Jun
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2019, 22 (03): : 549 - 556
  • [38] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
    Cai, Chen
    Wang, Yi
    Yap, Kim-Hui
    REMOTE SENSING, 2023, 15 (23)
  • [39] MULTI-SOURCE REMOTE SENSING IMAGE FUSION BASED ON SUPPORT VECTOR MACHINE附视频
    ZHAO ShuheFENG XuezhiKANG GuodingRAMADAN ElnazirDepartment of Urban Resources Scie ncesNanjing UniversityNanjing PRChina
    Chinese Geographical Science, 2002, (03) : 53 - 57
  • [40] A Target Recognition Algorithm of Multi-Source Remote Sensing Image Based on Visual Internet of Things
    Xue-jun Sun
    Jerry Chun-Wei Lin
    Mobile Networks and Applications, 2022, 27 : 784 - 793