Adaptive Masked Autoencoder Transformer for image classification

被引：1

作者：

Chen, Xiangru ^{[1
,2
]}

Liu, Chenjing ^{[1
,2
]}

Hu, Peng

Lin, Jie ^{[1
,2
]}

Gong, Yunhong

Chen, Yingke ^{[4
]}

Peng, Dezhong ^{[1
,3
]}

Geng, Xue ^{[2
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore

[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China

[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England

来源：

APPLIED SOFT COMPUTING | 2024年 / 164卷

基金：

中国国家自然科学基金;

关键词：

Vision transformer; Masked image modeling; Image classification;

D O I：

10.1016/j.asoc.2024.111958

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.

引用

页数：10

共 50 条

[31] BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation
Wang, Zhen
Feng, Zheng
Li, Yanjun
Li, Bowen
Wang, Yongrui
Sha, Chulin
He, Min
Li, Xiaolin
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
[32] Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing
Yu, Zitong
Cai, Rizhao
Cui, Yawen
Liu, Xin
Hu, Yongjian
Kot, Alex C.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5217 - 5238
[33] Dual-Branch Adaptive Convolutional Transformer for Hyperspectral Image Classification
Wang, Chuanzhi
Huang, Jun
Lv, Mingyun
Wu, Yongmei
Qin, Ruiru
REMOTE SENSING, 2024, 16 (09)
[34] Convolution-Transformer Adaptive Fusion Network for Hyperspectral Image Classification
Li, Jiaju
Xing, Hanfa
Ao, Zurui
Wang, Hefeng
Liu, Wenkai
Zhang, Anbing
APPLIED SCIENCES-BASEL, 2023, 13 (01):
[35] Orthogonal autoencoder regression for image classification
Yang, Zhangjing
Wu, Xinxin
Huang, Pu
Zhang, Fanlong
Wan, Minghua
Lai, Zhihui
INFORMATION SCIENCES, 2022, 618 : 400 - 416
[36] A Compositional Transformer Based Autoencoder for Image Style Transfer
Feng, Jianxin
Zhang, Geng
Li, Xinhui
Ding, Yuanming
Liu, Zhiguo
Pan, Chengsheng
Deng, Siyuan
Fang, Hui
ELECTRONICS, 2023, 12 (05)
[37] HDR Image Reconstruction Algorithm Based on Masked Transformer
Zhang, Zuheng
Chen, Xiaodong
Yi, Wang
Cai, Huaiyu
LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (02)
[38] Image Retrieval Based on Vision Transformer and Masked Learning
李锋
潘煌圣
盛守祥
王国栋
JournalofDonghuaUniversity(EnglishEdition), 2023, 40 (05) : 539 - 547
[39] Green Hierarchical Vision Transformer for Masked Image Modeling
Huang, Lang
You, Shan
Zheng, Mingkai
Wang, Fei
Qian, Chen
Yamasaki, Toshihiko
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[40] MAE-EEG-Transformer: A transformer-based approach combining masked autoencoder and cross-individual data augmentation pre-training for EEG classification
Cai, Miao
Zeng, Yu
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94

← 1 2 3 4 5 →