Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引:0
|
作者
Zhao, An [1 ]
Yang, Wenzhong [1 ,2 ]
Chen, Danny [1 ]
Wei, Fuyuan [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;
D O I
10.3390/electronics13183605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Visual Rotated Position Encoding Transformer for Remote Sensing Image Captioning
    Liu, Anli
    Meng, Lingwu
    Xiao, Liang
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17 : 20026 - 20040
  • [22] Remote-sensing image fusion based on curvelets and ICA
    Ghahremani, Morteza
    Ghassemian, Hassan
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2015, 36 (16) : 4131 - 4143
  • [23] A variational method for multisource remote-sensing image fusion
    Fang, Faming
    Li, Fang
    Zhang, Guixu
    Shen, Chaomin
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2013, 34 (07) : 2470 - 2486
  • [24] Recurrent fusion transformer for image captioning
    Zhenping Mou
    Qiao Yuan
    Tianqi Song
    Signal, Image and Video Processing, 2025, 19 (1)
  • [25] Remote-sensing image retrieval by combining image visual and semantic features
    Wang, M.
    Wan, Q. M.
    Gu, L. B.
    Song, T. Y.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2013, 34 (12) : 4200 - 4223
  • [26] Research on the Applicability of Transformer Model in Remote-Sensing Image Segmentation
    Yu, Minmin
    Qin, Fen
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [27] A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation
    Ma, Xianping
    Zhang, Xiaokang
    Pun, Man-On
    Liu, Ming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [28] Cascade Semantic Fusion for Image Captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Dong, Guohua
    Luo, Zhigang
    IEEE ACCESS, 2019, 7 : 66680 - 66688
  • [29] Multimodal Fusion Transformer for Remote Sensing Image Classification
    Roy, Swalpa Kumar
    Deria, Ankur
    Hong, Danfeng
    Rasti, Behnood
    Plaza, Antonio
    Chanussot, Jocelyn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [30] REMOTE-SENSING ENHANCED
    MCGEE, LC
    GEOTIMES, 1979, 24 (05): : 23 - 26