Adaptive Masked Autoencoder Transformer for image classification

被引:1
|
作者
Chen, Xiangru [1 ,2 ]
Liu, Chenjing [1 ,2 ]
Hu, Peng
Lin, Jie [1 ,2 ]
Gong, Yunhong
Chen, Yingke [4 ]
Peng, Dezhong [1 ,3 ]
Geng, Xue [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
基金
中国国家自然科学基金;
关键词
Vision transformer; Masked image modeling; Image classification;
D O I
10.1016/j.asoc.2024.111958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation
    Wang, Zhen
    Feng, Zheng
    Li, Yanjun
    Li, Bowen
    Wang, Yongrui
    Sha, Chulin
    He, Min
    Li, Xiaolin
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [32] Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing
    Yu, Zitong
    Cai, Rizhao
    Cui, Yawen
    Liu, Xin
    Hu, Yongjian
    Kot, Alex C.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5217 - 5238
  • [33] Dual-Branch Adaptive Convolutional Transformer for Hyperspectral Image Classification
    Wang, Chuanzhi
    Huang, Jun
    Lv, Mingyun
    Wu, Yongmei
    Qin, Ruiru
    REMOTE SENSING, 2024, 16 (09)
  • [34] Convolution-Transformer Adaptive Fusion Network for Hyperspectral Image Classification
    Li, Jiaju
    Xing, Hanfa
    Ao, Zurui
    Wang, Hefeng
    Liu, Wenkai
    Zhang, Anbing
    APPLIED SCIENCES-BASEL, 2023, 13 (01):
  • [35] Orthogonal autoencoder regression for image classification
    Yang, Zhangjing
    Wu, Xinxin
    Huang, Pu
    Zhang, Fanlong
    Wan, Minghua
    Lai, Zhihui
    INFORMATION SCIENCES, 2022, 618 : 400 - 416
  • [36] A Compositional Transformer Based Autoencoder for Image Style Transfer
    Feng, Jianxin
    Zhang, Geng
    Li, Xinhui
    Ding, Yuanming
    Liu, Zhiguo
    Pan, Chengsheng
    Deng, Siyuan
    Fang, Hui
    ELECTRONICS, 2023, 12 (05)
  • [37] HDR Image Reconstruction Algorithm Based on Masked Transformer
    Zhang, Zuheng
    Chen, Xiaodong
    Yi, Wang
    Cai, Huaiyu
    LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (02)
  • [38] Image Retrieval Based on Vision Transformer and Masked Learning
    李锋
    潘煌圣
    盛守祥
    王国栋
    JournalofDonghuaUniversity(EnglishEdition), 2023, 40 (05) : 539 - 547
  • [39] Green Hierarchical Vision Transformer for Masked Image Modeling
    Huang, Lang
    You, Shan
    Zheng, Mingkai
    Wang, Fei
    Qian, Chen
    Yamasaki, Toshihiko
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [40] MAE-EEG-Transformer: A transformer-based approach combining masked autoencoder and cross-individual data augmentation pre-training for EEG classification
    Cai, Miao
    Zeng, Yu
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94