Adaptive Masked Autoencoder Transformer for image classification

被引：1

作者：

Chen, Xiangru ^{[1
,2
]}

Liu, Chenjing ^{[1
,2
]}

Hu, Peng

Lin, Jie ^{[1
,2
]}

Gong, Yunhong

Chen, Yingke ^{[4
]}

Peng, Dezhong ^{[1
,3
]}

Geng, Xue ^{[2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore

[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China

[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England

来源：

APPLIED SOFT COMPUTING | 2024年 / 164卷

基金：

中国国家自然科学基金;

关键词：

Vision transformer; Masked image modeling; Image classification;

D O I：

10.1016/j.asoc.2024.111958

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.

引用

页数：10

共 50 条

[1] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
Cao, Xianghai
Lin, Haifeng
Guo, Shuaixu
Xiong, Tao
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[2] Masked and Adaptive Transformer for Exemplar Based Image Translation
Jiang, Chang
Gao, Fei
Ma, Biao
Lin, Yuhao
Wang, Nannan
Xu, Gang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22418 - 22427
[3] Dual Branch Masked Transformer for Hyperspectral Image Classification
Li, Kuo
Chen, Yushi
Huang, Lingbo
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[4] Hydrophobicity classification of polymeric insulators using a masked autoencoder model in vision transformer
Panigrahy, Satyajit
Karmakar, Subrata
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 116
[5] A Center-Masked Transformer for Hyperspectral Image Classification
Jia, Sen
Wang, Yifan
Jiang, Shuguo
He, Ruyan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
[6] SPECTRAL MASKED AUTOENCODER FOR FEW-SHOT HYPERSPECTRAL IMAGE CLASSIFICATION
Feng, Pengming
Wang, Kaihan
Guan, Jian
He, Guangjun
Jin, Shichao
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5041 - 5044
[7] Masked Autoencoder Transformer for Missing Data Imputation of PISA
Freire, Guilherme Mendonca
Curi, Mariana
ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 364 - 372
[8] MAPM:PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory Tokens
Wang, Jianlong
Li, Yingying
Quan, Dou
Hou, Beibei
Wang, Zhensong
Sima, Haifeng
Sun, Junding
REMOTE SENSING, 2024, 16 (22)
[9] HSIMAE: A Unified Masked Autoencoder With Large-Scale Pretraining for Hyperspectral Image Classification
Wang, Yue
Wen, Ming
Zhang, Hailiang
Sun, Jinyu
Yang, Qiong
Zhang, Zhimin
Lu, Hongmei
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 14064 - 14079
[10] CAEVT: Convolutional Autoencoder Meets Lightweight Vision Transformer for Hyperspectral Image Classification
Zhang, Zhiwen
Li, Teng
Tang, Xuebin
Hu, Xiang
Peng, Yuanxi
SENSORS, 2022, 22 (10)

← 1 2 3 4 5 →