Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引:0
|
作者
Zhao, An [1 ]
Yang, Wenzhong [1 ,2 ]
Chen, Danny [1 ]
Wei, Fuyuan [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;
D O I
10.3390/electronics13183605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [2] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu, Honglin
    Huang, Peng
    Zhang, Min
    Tang, Wenlong
    Yu, Xinyu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [4] CTFNet: CNN-Transformer Fusion Network for Remote-Sensing Image Semantic Segmentation
    Wu H.
    Huang P.
    Zhang M.
    Tang W.
    IEEE Geoscience and Remote Sensing Letters, 2024, 21 : 1 - 5
  • [5] CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation
    Chen, Xin
    Li, Dongfen
    Liu, Mingzhe
    Jia, Jiaru
    REMOTE SENSING, 2023, 15 (18)
  • [6] Multi-label semantic feature fusion for remote sensing image captioning
    Wang, Shuang
    Ye, Xiutiao
    Gu, Yu
    Wang, Jihui
    Meng, Yun
    Tian, Jingxian
    Hou, Biao
    Jiao, Licheng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
  • [7] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005
  • [8] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [9] Adaptive Semantic-Enhanced Transformer for Image Captioning
    Zhang, Jing
    Fang, Zhongjun
    Sun, Han
    Wang, Zhe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1785 - 1796
  • [10] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)