Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引:0
|
作者
Zhao, An [1 ]
Yang, Wenzhong [1 ,2 ]
Chen, Danny [1 ]
Wei, Fuyuan [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;
D O I
10.3390/electronics13183605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Learning consensus-aware semantic knowledge for remote sensing image captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Cheng, Xina
    Tang, Xu
    Jiao, Licheng
    PATTERN RECOGNITION, 2024, 145
  • [32] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
    Wang, Qi
    Yang, Zhigang
    Ni, Weiping
    Wu, Junzheng
    Li, Qiang
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [33] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [34] Remote sensing image semantic segmentation based on cascaded Transformer
    Wang F.
    Ji J.
    Wang Y.
    IEEE. Trans. Artif. Intell., 2024, 8 (4136-4148): : 1 - 12
  • [35] Global and edge enhanced transformer for semantic segmentation of remote sensing
    Wang, Hengyou
    Li, Xiao
    Huo, Lianzhi
    Hu, Changmiao
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5658 - 5673
  • [36] Fusion of multisensor and multitemporal data in remote-sensing image analysis
    Bruzzone, L
    Serpico, SB
    IGARSS '98 - 1998 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, PROCEEDINGS VOLS 1-5: SENSING AND MANAGING THE ENVIRONMENT, 1998, : 162 - 164
  • [37] Spatial Dynamic Selection Network for Remote-Sensing Image Fusion
    Hu, Jianwen
    Hu, Pei
    Wang, Zeping
    Kang, Xudong
    Fan, Shaosheng
    Mao, Dun
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [38] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [39] Spatiotemporal Remote-Sensing Image Fusion With Patch-Group Compressed Sensing
    Li, Lei
    Liu, Peng
    Wu, Jie
    Wang, Lizhe
    He, Guojin
    IEEE ACCESS, 2020, 8 (08): : 209199 - 209211
  • [40] Deep Hash Remote-Sensing Image Retrieval Assisted by Semantic Cues
    Liu, Pingping
    Liu, Zetong
    Shan, Xue
    Zhou, Qiuzhan
    REMOTE SENSING, 2022, 14 (24)