MCFT: Multimodal Contrastive Fusion Transformer for Classification of Hyperspectral Image and LiDAR Data

被引：0

作者：

Feng, Yining ^{[1
]}

Jin, Jiarui ^{[2
]}

Yin, Yin ^{[2
]}

Song, Chuanming ^{[3
]}

Wang, Xianghai ^{[1
,2
]}

机构：

[1] Liaoning Normal Univ, Sch Geog, Dalian 116029, Peoples R China

[2] Liaoning Normal Univ, Sch Comp Sci & Artificial Intelligence, Dalian 116029, Peoples R China

[3] Dalian Univ, Sch Informat Engn, Dalian 116622, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Laser radar; Data mining; Convolutional neural networks; Computer vision; Accuracy; Head; Electronic mail; Data models; Contrastive learning; deep learning (DL); feature alignment; feature matching; HS-LiDAR fusion and classification; vision transformer (ViT);

D O I：

10.1109/TGRS.2024.3490752

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Multisource remote sensing (RS) image fusion leverages data from various sensors to enhance the accuracy and comprehensiveness of Earth observation. Notably, the fusion of hyperspectral (HS) images and light detection and ranging (LiDAR) data has garnered significant attention due to their complementary features. However, current methods predominantly rely on simplistic techniques such as weight sharing, feature superposition, or feature products, which often fall short of achieving true feature fusion. These methods primarily focus on feature accumulation rather than integrative fusion. The transformer framework, with its self-attention mechanisms, offers potential for effective multimodal data fusion. However, simple linear transformations used in feature extraction may not adequately capture all relevant information. To address these challenges, we propose a novel multimodal contrastive fusion transformer (MCFT). Our approach employs convolutional neural networks (CNNs) for feature extraction from different modalities and leverages transformer networks for advanced fusion. We have modified the basic transformer architecture and propose a double position embedding mode to make it more suitable for RS image processing tasks. We introduce two novel modules: feature alignment module and feature matching module, designed to exploit both paired and unpaired samples. These modules facilitate more effective cross-modal learning by emphasizing the commonalities within the same features and the differences between features from distinct modalities. Experimental evaluations on several publicly available HS-LiDAR datasets demonstrate that proposed method consistently outperforms existing advanced methods. The source code for our approach is available at: https://github.com/SYFYN0317/MCFT.

引用

页数：17

共 50 条

[31] COMBINING FEATURE FUSION AND DECISION FUSION FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA
Liao, Wenzhi
Bellens, Rik
Pizurica, Aleksandra
Gautama, Sidharta
Philips, Wilfried
2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 1241 - 1244
[32] Dual selective fusion transformer network for hyperspectral image classification
Xu, Yichu
Wang, Di
Zhang, Lefei
Zhang, Liangpei
NEURAL NETWORKS, 2025, 187
[33] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
Cao, Xianghai
Lin, Haifeng
Guo, Shuaixu
Xiong, Tao
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[34] Convolution Transformer Fusion Splicing Network for Hyperspectral Image Classification
Zhao, Feng
Li, Shijie
Zhang, Junjie
Liu, Hanqiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[35] Convolution Transformer Fusion Splicing Network for Hyperspectral Image Classification
Zhao, Feng
Li, Shijie
Zhang, Junjie
Liu, Hanqiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[36] ConVaT: A Variational Generative Transformer With Momentum Contrastive Learning for Hyperspectral Image Classification
Liang, Miaomiao
Liu, Zuo
Dong, Jian
Yu, Lingjuan
Yu, Xiangchun
Li, Jun
Jiao, Licheng
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[37] S2EFT: Spectral-Spatial-Elevation Fusion Transformer for hyperspectral image and LiDAR classification
Feng, Yining
Zhu, Junheng
Song, Ruoxi
Wang, Xianghai
KNOWLEDGE-BASED SYSTEMS, 2024, 283
[38] TMCFN: Text-Supervised Multidimensional Contrastive Fusion Network for Hyperspectral and LiDAR Classification
Yang, Yueguang
Qu, Jiahui
Dong, Wenqian
Zhang, Tongzhen
Xiao, Song
Li, Yunsong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 18 - 18
[39] TMCFN: Text-Supervised Multidimensional Contrastive Fusion Network for Hyperspectral and LiDAR Classification
Yang, Yueguang
Qu, Jiahui
Dong, Wenqian
Zhang, Tongzhen
Xiao, Song
Li, Yunsong
IEEE Transactions on Geoscience and Remote Sensing, 2024, 62 : 1 - 15
[40] DISCRIMINATIVE FEATURE EXTRACTION AND FUSION FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA
Song, Weiwei
Gao, Zhi
Zhang, Yongjun
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2271 - 2274

← 1 2 3 4 5 →