Adaptive Masked Autoencoder Transformer for image classification

被引:1
|
作者
Chen, Xiangru [1 ,2 ]
Liu, Chenjing [1 ,2 ]
Hu, Peng
Lin, Jie [1 ,2 ]
Gong, Yunhong
Chen, Yingke [4 ]
Peng, Dezhong [1 ,3 ]
Geng, Xue [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
基金
中国国家自然科学基金;
关键词
Vision transformer; Masked image modeling; Image classification;
D O I
10.1016/j.asoc.2024.111958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Dual-Phase Framework for Few-Shot Hyperspectral Image Classification With Spatiospectral Masked Autoencoder and Episode Training
    Khotimah, Wijayanti Nurul
    Bennamoun, Mohammed
    Boussaid, Farid
    Xu, Lian
    Sohel, Ferdous
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [42] SpectralMAE: Spectral Masked Autoencoder for Hyperspectral Remote Sensing Image Reconstruction
    Zhu, Lingxuan
    Wu, Jiaji
    Biao, Wang
    Liao, Yi
    Gu, Dandan
    SENSORS, 2023, 23 (07)
  • [43] Autoencoder and Masked Image Encoding-Based Attentional Pose Network
    Hu, Longhua
    Ma, Xiaoliang
    He, Cheng
    Wang, Lei
    Cheng, Jun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 221 - 233
  • [44] Adaptive feature recalibration transformer for enhancing few-shot image classification
    Song, Wei
    Huang, Yaobin
    VISUAL COMPUTER, 2025,
  • [45] Adaptive Learnable Spectral-Spatial Fusion Transformer for Hyperspectral Image Classification
    Wang, Minhui
    Sun, Yaxiu
    Xiang, Jianhong
    Sun, Rui
    Zhong, Yu
    REMOTE SENSING, 2024, 16 (11)
  • [46] POLSAR IMAGE CLASSIFICATION WITH TRANSFORMER
    Zhang, Yunpeng
    Ferraioli, Giampaolo
    Pascazio, Vito
    Schirinzi, Gilda
    Vitale, Sergio
    Xing, Mengdao
    Yu, Hanwen
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 3414 - 3417
  • [47] SMAE-Fusion: Integrating saliency-aware masked autoencoder with hybrid attention transformer for infrared-visible image fusion
    Wang, Qinghua
    Li, Ziwei
    Zhang, Shuqi
    Luo, Yuhong
    Chen, Wentao
    Wang, Tianyun
    Chi, Nan
    Dai, Qionghai
    INFORMATION FUSION, 2025, 117
  • [48] Enhancing image classification using adaptive convolutional autoencoder-based snow avalanches algorithm
    Dhiravidachelvi, E.
    Devadas, T. Joshva
    Kumar, P. J. Sathish
    Pandi, S. Senthil
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 6867 - 6879
  • [49] Image quality in image classification: Adaptive image quality modification with adaptive classification
    Yan, Shuo
    Sayad, Saed
    Balke, Stephen T.
    COMPUTERS & CHEMICAL ENGINEERING, 2009, 33 (02) : 429 - 435
  • [50] Spectral-Spatial Blockwise Masked Transformer With Contrastive Multi-View Learning for Hyperspectral Image Classification
    Hu, Han
    Liu, Zhenhui
    Xu, Ziqing
    Wang, Haoyi
    Li, Xianju
    Han, Xu
    Peng, Jianyi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 480 - 494