Adaptive Masked Autoencoder Transformer for image classification

被引:1
|
作者
Chen, Xiangru [1 ,2 ]
Liu, Chenjing [1 ,2 ]
Hu, Peng
Lin, Jie [1 ,2 ]
Gong, Yunhong
Chen, Yingke [4 ]
Peng, Dezhong [1 ,3 ]
Geng, Xue [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
基金
中国国家自然科学基金;
关键词
Vision transformer; Masked image modeling; Image classification;
D O I
10.1016/j.asoc.2024.111958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Masked Auto-Encoding Spectral-Spatial Transformer for Hyperspectral Image Classification
    Ibanez, Damian
    Fernandez-Beltran, Ruben
    Pla, Filiberto
    Yokoya, Naoto
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [22] ACTN: Adaptive Coupling Transformer Network for Hyperspectral Image Classification
    Yang, Xiaofei
    Cao, Weijia
    Tang, Dong
    Zhou, Yicong
    Lu, Yao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [23] Classification of masked image data
    Lis, Kamila
    Korycinski, Mateusz
    Ciecierski, Konrad A.
    PLOS ONE, 2021, 16 (07):
  • [24] IMAGE INPAINTING BY MSCSWIN TRANSFORMER ADVERSARIAL AUTOENCODER
    Chen, Bo-Wei
    Liu, Tsung-Jung
    Liu, Kuan-Hsien
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2040 - 2044
  • [25] An effective masked transformer network for image denoising
    Xu, Shaoping
    Xiao, Nan
    Tao, Wuyong
    Zhou, Changfei
    Xiong, Minghai
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 4997 - 5010
  • [26] Masked Diffusion Transformer is a Strong Image Synthesizer
    Gao, Shanghua
    Zhou, Pan
    Cheng, Ming-Ming
    Yan, Shuicheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23107 - 23116
  • [27] Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
    Shen, Xiaobo
    Cai, Haoyu
    Gong, Xiuwen
    Zheng, Yuhui
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1218 - 1226
  • [28] Multi-View Masked Autoencoder for General Image Representation
    Ji, Seungbin
    Han, Sangkwon
    Rhee, Jongtae
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [29] A Band Selection Method With Masked Convolutional Autoencoder for Hyperspectral Image
    Liu, Yufei
    Li, Xiaorun
    Hua, Ziqiang
    Xia, Chaoqun
    Zhao, Liaoying
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [30] Deblurring Masked Autoencoder Is Better Recipe for Ultrasound Image Recognition
    Kang, Qingbo
    Gao, Jun
    Li, Kang
    Lao, Qicheng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 352 - 362