Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引:0
|
作者
Zhao, An [1 ]
Yang, Wenzhong [1 ,2 ]
Chen, Danny [1 ]
Wei, Fuyuan [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;
D O I
10.3390/electronics13183605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
    Cai, Chen
    Wang, Yi
    Yap, Kim-Hui
    REMOTE SENSING, 2023, 15 (23)
  • [42] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    REMOTE SENSING, 2022, 14 (12)
  • [43] Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning
    Huang, Wei
    Wang, Qi
    Li, Xuelong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) : 436 - 440
  • [44] Captioning Remote Sensing Images Using Transformer Architecture
    Nanal, Wrucha
    Hajiarbabi, Mohammadreza
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
  • [45] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
    Yuan, Zhenghang
    Li, Xuelong
    Wang, Qi
    IEEE ACCESS, 2020, 8 (08): : 2608 - 2620
  • [46] Dual Global Enhanced Transformer for image captioning
    Xian, Tiantao
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    NEURAL NETWORKS, 2022, 148 : 129 - 141
  • [47] Input enhanced asymmetric transformer for image captioning
    Chenhao Zhu
    Xia Ye
    Qiduo Lu
    Signal, Image and Video Processing, 2023, 17 : 1419 - 1427
  • [48] Input enhanced asymmetric transformer for image captioning
    Zhu, Chenhao
    Ye, Xia
    Lu, Qiduo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1419 - 1427
  • [49] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
    Fan, Lili
    Zhou, Yu
    Liu, Hongmei
    Li, Yunjie
    Cao, Dongpu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
  • [50] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
    Hua, Wenyi
    Liu, Jia
    Liu, Fang
    Zhang, Wenhua
    An, Jiaqi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502