MCFT: Multimodal Contrastive Fusion Transformer for Classification of Hyperspectral Image and LiDAR Data

被引:0
|
作者
Feng, Yining [1 ]
Jin, Jiarui [2 ]
Yin, Yin [2 ]
Song, Chuanming [3 ]
Wang, Xianghai [1 ,2 ]
机构
[1] Liaoning Normal Univ, Sch Geog, Dalian 116029, Peoples R China
[2] Liaoning Normal Univ, Sch Comp Sci & Artificial Intelligence, Dalian 116029, Peoples R China
[3] Dalian Univ, Sch Informat Engn, Dalian 116622, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Laser radar; Data mining; Convolutional neural networks; Computer vision; Accuracy; Head; Electronic mail; Data models; Contrastive learning; deep learning (DL); feature alignment; feature matching; HS-LiDAR fusion and classification; vision transformer (ViT);
D O I
10.1109/TGRS.2024.3490752
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Multisource remote sensing (RS) image fusion leverages data from various sensors to enhance the accuracy and comprehensiveness of Earth observation. Notably, the fusion of hyperspectral (HS) images and light detection and ranging (LiDAR) data has garnered significant attention due to their complementary features. However, current methods predominantly rely on simplistic techniques such as weight sharing, feature superposition, or feature products, which often fall short of achieving true feature fusion. These methods primarily focus on feature accumulation rather than integrative fusion. The transformer framework, with its self-attention mechanisms, offers potential for effective multimodal data fusion. However, simple linear transformations used in feature extraction may not adequately capture all relevant information. To address these challenges, we propose a novel multimodal contrastive fusion transformer (MCFT). Our approach employs convolutional neural networks (CNNs) for feature extraction from different modalities and leverages transformer networks for advanced fusion. We have modified the basic transformer architecture and propose a double position embedding mode to make it more suitable for RS image processing tasks. We introduce two novel modules: feature alignment module and feature matching module, designed to exploit both paired and unpaired samples. These modules facilitate more effective cross-modal learning by emphasizing the commonalities within the same features and the differences between features from distinct modalities. Experimental evaluations on several publicly available HS-LiDAR datasets demonstrate that proposed method consistently outperforms existing advanced methods. The source code for our approach is available at: https://github.com/SYFYN0317/MCFT.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] COMBINING FEATURE FUSION AND DECISION FUSION FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA
    Liao, Wenzhi
    Bellens, Rik
    Pizurica, Aleksandra
    Gautama, Sidharta
    Philips, Wilfried
    2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 1241 - 1244
  • [32] Dual selective fusion transformer network for hyperspectral image classification
    Xu, Yichu
    Wang, Di
    Zhang, Lefei
    Zhang, Liangpei
    NEURAL NETWORKS, 2025, 187
  • [33] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [34] Convolution Transformer Fusion Splicing Network for Hyperspectral Image Classification
    Zhao, Feng
    Li, Shijie
    Zhang, Junjie
    Liu, Hanqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [35] Convolution Transformer Fusion Splicing Network for Hyperspectral Image Classification
    Zhao, Feng
    Li, Shijie
    Zhang, Junjie
    Liu, Hanqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [36] ConVaT: A Variational Generative Transformer With Momentum Contrastive Learning for Hyperspectral Image Classification
    Liang, Miaomiao
    Liu, Zuo
    Dong, Jian
    Yu, Lingjuan
    Yu, Xiangchun
    Li, Jun
    Jiao, Licheng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [37] S2EFT: Spectral-Spatial-Elevation Fusion Transformer for hyperspectral image and LiDAR classification
    Feng, Yining
    Zhu, Junheng
    Song, Ruoxi
    Wang, Xianghai
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [38] TMCFN: Text-Supervised Multidimensional Contrastive Fusion Network for Hyperspectral and LiDAR Classification
    Yang, Yueguang
    Qu, Jiahui
    Dong, Wenqian
    Zhang, Tongzhen
    Xiao, Song
    Li, Yunsong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 18 - 18
  • [39] TMCFN: Text-Supervised Multidimensional Contrastive Fusion Network for Hyperspectral and LiDAR Classification
    Yang, Yueguang
    Qu, Jiahui
    Dong, Wenqian
    Zhang, Tongzhen
    Xiao, Song
    Li, Yunsong
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62 : 1 - 15
  • [40] DISCRIMINATIVE FEATURE EXTRACTION AND FUSION FOR CLASSIFICATION OF HYPERSPECTRAL AND LIDAR DATA
    Song, Weiwei
    Gao, Zhi
    Zhang, Yongjun
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2271 - 2274