Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引：0

作者：

Zhao, An ^{[1
]}

Yang, Wenzhong ^{[1
,2
]}

Chen, Danny ^{[1
]}

Wei, Fuyuan ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China

[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 18期

基金：

中国国家自然科学基金;

关键词：

remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;

D O I：

10.3390/electronics13183605

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.

引用

页数：17

共 50 条

[41] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
Cai, Chen
Wang, Yi
Yap, Kim-Hui
REMOTE SENSING, 2023, 15 (23)
[42] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
Ren, Zihao
Gou, Shuiping
Guo, Zhang
Mao, Shasha
Li, Ruimin
REMOTE SENSING, 2022, 14 (12)
[43] Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning
Huang, Wei
Wang, Qi
Li, Xuelong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) : 436 - 440
[44] Captioning Remote Sensing Images Using Transformer Architecture
Nanal, Wrucha
Hajiarbabi, Mohammadreza
2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
[45] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
Yuan, Zhenghang
Li, Xuelong
Wang, Qi
IEEE ACCESS, 2020, 8 (08): : 2608 - 2620
[46] Dual Global Enhanced Transformer for image captioning
Xian, Tiantao
Li, Zhixin
Zhang, Canlong
Ma, Huifang
NEURAL NETWORKS, 2022, 148 : 129 - 141
[47] Input enhanced asymmetric transformer for image captioning
Chenhao Zhu
Xia Ye
Qiduo Lu
Signal, Image and Video Processing, 2023, 17 : 1419 - 1427
[48] Input enhanced asymmetric transformer for image captioning
Zhu, Chenhao
Ye, Xia
Lu, Qiduo
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1419 - 1427
[49] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
Fan, Lili
Zhou, Yu
Liu, Hongmei
Li, Yunjie
Cao, Dongpu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
[50] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
Hua, Wenyi
Liu, Jia
Liu, Fang
Zhang, Wenhua
An, Jiaqi
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502

← 1 2 3 4 5 →