Light Self-Gaussian-Attention Vision Transformer for Hyperspectral Image Classification

被引：13

作者：

Ma, Chao ^{[1
,2
]}

Wan, Minjie ^{[1
,2
]}

Wu, Jian ^{[3
]}

Kong, Xiaofang ^{[4
]}

Shao, Ajun ^{[1
,2
]}

Wang, Fan ^{[1
,2
]}

Chen, Qian ^{[1
,2
]}

Gu, Guohua ^{[1
,2
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Elect & Opt Engn, Nanjing 210094, Peoples R China

[2] Nanjing Univ Sci & Technol, Jiangsu Key Lab Spectral Imaging & Intelligent Sen, Nanjing 210094, Peoples R China

[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Peoples R China

[4] Nanjing Univ Sci & Technol, Natl Key Lab Transient Phys, Nanjing 210094, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

关键词：

Feature extraction; Transformers; Principal component analysis; Computational modeling; Task analysis; Data mining; Correlation; Gaussian position module; hybrid spatial-spectral tokenizer; hyperspectral image (HSI) classification; light self-Gaussian attention (LSGA); location-aware long-distance modeling; NETWORK;

D O I：

10.1109/TIM.2023.3279922

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, convolutional neural networks (CNNs) have been widely used in hyperspectral image (HSI) classification because of their exceptional performance in local feature extraction. However, due to the local join and weight sharing properties of the convolution kernel, CNNs have limitations in long-distance modeling, and deeper networks tend to increase computational costs. To address these issues, this article proposes a vision Transformer (VIT) based on the light self-Gaussian-attention (LSGA) mechanism, which extracts global deep semantic features. First, the hybrid spatial-spectral tokenizer module extracts shallow spatial-spectral features and expands image patches to generate tokens. Next, the light self-attention uses Q (query), X (origin input), and X instead of Q, K (key), and V (value) to reduce the computation and parameters. Furthermore, to avoid the lack of location information resulting in the aliasing of central and neighborhood features, we devise Gaussian absolute position bias to simulate HSI data distribution and make the attention weight closer to the central query block. Several experiments verify the effectiveness of the proposed method, which outperforms state-of-the-art methods on four datasets. Specifically, we observed a 0.62% accuracy improvement over A2S2K and a 0.11% improvement over SSFTT. In conclusion, the proposed LSGA-VIT method demonstrates promising results in the HSI classification and shows potential in addressing the issues of location-aware long-distance modeling and computational cost. Our codes are available at https://github.com/machao132/LSGA-VIT.

引用

页数：12

共 50 条

[41] Dictionary cache transformer for hyperspectral image classification
Zhou, Heng
Zhang, Xin
Zhang, Chunlei
Ma, Qiaoyu
Jiang, Yanan
APPLIED INTELLIGENCE, 2023, 53 (22) : 26725 - 26749
[42] Convolution Transformer Mixer for Hyperspectral Image Classification
Zhang, Junjie
Meng, Zhe
Zhao, Feng
Liu, Hanqiang
Chang, Zhenhui
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[43] A hybrid convolution transformer for hyperspectral image classification
Arshad, Tahir
Zhang, Junping
Ullah, Inam
EUROPEAN JOURNAL OF REMOTE SENSING, 2024, 57 (01)
[44] Improved Transformer Net for Hyperspectral Image Classification
Qing, Yuhao
Liu, Wenyi
Feng, Liuyan
Gao, Wanjia
REMOTE SENSING, 2021, 13 (11)
[45] Dual Self-Attention Swin Transformer for Hyperspectral Image Super-Resolution
Long, Yaqian
Wang, Xun
Xu, Meng
Zhang, Shuyu
Jiang, Shuguo
Jia, Sen
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[46] Structured Gaussian components for hyperspectral image classification
Berge, Asbjorn
Schistad Solberg, Anne H.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2006, 44 (11): : 3386 - 3396
[47] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chen, Chun-Fu
Fan, Quanfu
Panda, Rameswar
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
[48] A spatial–spectral fusion convolutional transformer network with contextual multi-head self-attention for hyperspectral image classification
Wang, Wuli
Sun, Qi
Zhang, Li
Ren, Peng
Wang, Jianbu
Ren, Guangbo
Liu, Baodi
Neural Networks, 2025, 187
[49] WFSS: weighted fusion of spectral transformer and spatial self-attention for robust hyperspectral image classification against adversarial attacks
Lichun Tang
Zhaoxia Yin
Hang Su
Wanli Lyu
Bin Luo
Visual Intelligence, 2 (1):
[50] MDvT: introducing mobile three-dimensional convolution to a vision transformer for hyperspectral image classification
Zhou, Xinyao
Zhou, Wenzuo
Fu, Xiaoli
Hu, Yichen
Liu, Jinlian
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2023, 16 (01) : 1469 - 1490

← 1 2 3 4 5 →