Adaptive Masked Autoencoder Transformer for image classification

被引:1
|
作者
Chen, Xiangru [1 ,2 ]
Liu, Chenjing [1 ,2 ]
Hu, Peng
Lin, Jie [1 ,2 ]
Gong, Yunhong
Chen, Yingke [4 ]
Peng, Dezhong [1 ,3 ]
Geng, Xue [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
基金
中国国家自然科学基金;
关键词
Vision transformer; Masked image modeling; Image classification;
D O I
10.1016/j.asoc.2024.111958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [2] Masked and Adaptive Transformer for Exemplar Based Image Translation
    Jiang, Chang
    Gao, Fei
    Ma, Biao
    Lin, Yuhao
    Wang, Nannan
    Xu, Gang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22418 - 22427
  • [3] Dual Branch Masked Transformer for Hyperspectral Image Classification
    Li, Kuo
    Chen, Yushi
    Huang, Lingbo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [4] Hydrophobicity classification of polymeric insulators using a masked autoencoder model in vision transformer
    Panigrahy, Satyajit
    Karmakar, Subrata
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 116
  • [5] A Center-Masked Transformer for Hyperspectral Image Classification
    Jia, Sen
    Wang, Yifan
    Jiang, Shuguo
    He, Ruyan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [6] SPECTRAL MASKED AUTOENCODER FOR FEW-SHOT HYPERSPECTRAL IMAGE CLASSIFICATION
    Feng, Pengming
    Wang, Kaihan
    Guan, Jian
    He, Guangjun
    Jin, Shichao
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5041 - 5044
  • [7] Masked Autoencoder Transformer for Missing Data Imputation of PISA
    Freire, Guilherme Mendonca
    Curi, Mariana
    ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2024, PT I, 2024, 2150 : 364 - 372
  • [8] MAPM:PolSAR Image Classification with Masked Autoencoder Based on Position Prediction and Memory Tokens
    Wang, Jianlong
    Li, Yingying
    Quan, Dou
    Hou, Beibei
    Wang, Zhensong
    Sima, Haifeng
    Sun, Junding
    REMOTE SENSING, 2024, 16 (22)
  • [9] HSIMAE: A Unified Masked Autoencoder With Large-Scale Pretraining for Hyperspectral Image Classification
    Wang, Yue
    Wen, Ming
    Zhang, Hailiang
    Sun, Jinyu
    Yang, Qiong
    Zhang, Zhimin
    Lu, Hongmei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 14064 - 14079
  • [10] CAEVT: Convolutional Autoencoder Meets Lightweight Vision Transformer for Hyperspectral Image Classification
    Zhang, Zhiwen
    Li, Teng
    Tang, Xuebin
    Hu, Xiang
    Peng, Yuanxi
    SENSORS, 2022, 22 (10)