Masked Autoencoders in Computer Vision: A Comprehensive Survey

被引:3
|
作者
Zhou, Zexian [1 ]
Liu, Xiaojing [1 ]
机构
[1] Qinghai Univ, Dept Comp Technol & Applicat, Xining 810016, Peoples R China
关键词
Computer vision survey; MAE; masked autoencoders; masked image modeling;
D O I
10.1109/ACCESS.2023.3323383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.
引用
收藏
页码:113560 / 113579
页数:20
相关论文
共 50 条
  • [1] Masked Autoencoders Are Scalable Vision Learners
    He, Kaiming
    Chen, Xinlei
    Xie, Saining
    Li, Yanghao
    Dollar, Piotr
    Girshick, Ross
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
  • [2] A Comprehensive Survey of Transformers for Computer Vision
    Jamil, Sonain
    Piran, Md. Jalil
    Kwon, Oh-Jin
    DRONES, 2023, 7 (05)
  • [3] Bootstrapped Masked Autoencoders for Vision BERT Pretraining
    Dong, Xiaoyi
    Bao, Jianmin
    Zhang, Ting
    Chen, Dongdong
    Zhang, Weiming
    Yuan, Lu
    Chen, Dong
    Wen, Fang
    Yu, Nenghai
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 247 - 264
  • [4] Contrastive Masked Autoencoders are Stronger Vision Learners
    Huang, Zhicheng
    Jin, Xiaojie
    Lu, Chengze
    Hou, Qibin
    Cheng, Ming-Ming
    Fu, Dongmei
    Shen, Xiaohui
    Feng, Jiashi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2506 - 2517
  • [5] Deep reinforcement learning in computer vision: a comprehensive survey
    Le, Ngan
    Rathour, Vidhiwar Singh
    Yamazaki, Kashu
    Luu, Khoa
    Savvides, Marios
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2733 - 2819
  • [6] Deep reinforcement learning in computer vision: a comprehensive survey
    Ngan Le
    Vidhiwar Singh Rathour
    Kashu Yamazaki
    Khoa Luu
    Marios Savvides
    Artificial Intelligence Review, 2022, 55 : 2733 - 2819
  • [7] Computer vision based food grain classification: A comprehensive survey
    Velesaca, Henry O.
    Suarez, Patricia L.
    Mira, Raul
    Sappa, Angel D.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 187
  • [8] A Comprehensive Survey of Indoor Localization Methods Based on Computer Vision
    Morar, Anca
    Moldoveanu, Alin
    Mocanu, Irina
    Moldoveanu, Florica
    Radoi, Ion Emilian
    Asavei, Victor
    Gradinaru, Alexandru
    Butean, Alex
    SENSORS, 2020, 20 (09)
  • [9] Computer vision-based plants phenotyping: A comprehensive survey
    Meraj, Talha
    Sharif, Muhammad Imran
    Raza, Mudassar
    Alabrah, Amerah
    Kadry, Seifedine
    Gandomi, Amir H.
    ISCIENCE, 2024, 27 (01)
  • [10] Masked Autoencoders that Listen
    Huang, Po-Yao
    Xu, Hu
    Li, Juncheng
    Baevski, Alexei
    Auli, Michael
    Galuba, Wojciech
    Metze, Florian
    Feichtenhofer, Christoph
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,