Masked Autoencoders in Computer Vision: A Comprehensive Survey

被引:3
|
作者
Zhou, Zexian [1 ]
Liu, Xiaojing [1 ]
机构
[1] Qinghai Univ, Dept Comp Technol & Applicat, Xining 810016, Peoples R China
关键词
Computer vision survey; MAE; masked autoencoders; masked image modeling;
D O I
10.1109/ACCESS.2023.3323383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.
引用
收藏
页码:113560 / 113579
页数:20
相关论文
共 50 条
  • [31] A SURVEY OF SENSOR PLANNING IN COMPUTER VISION
    TARABANIS, KA
    ALLEN, PK
    TSAI, RY
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1995, 11 (01): : 86 - 104
  • [32] Adversarial attacks in computer vision: a survey
    Li, Chao
    Wang, Handing
    Yao, Wen
    Jiang, Tingsong
    JOURNAL OF MEMBRANE COMPUTING, 2024, 6 (2) : 130 - 147
  • [33] Prompt learning in computer vision: a survey
    Lei, Yiming
    Li, Jingqi
    Li, Zilong
    Cao, Yuan
    Shan, Hongming
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 42 - 63
  • [34] Attention mechanisms in computer vision: A survey
    Meng-Hao Guo
    Tian-Xing Xu
    Jiang-Jiang Liu
    Zheng-Ning Liu
    Peng-Tao Jiang
    Tai-Jiang Mu
    Song-Hai Zhang
    Ralph R.Martin
    Ming-Ming Cheng
    Shi-Min Hu
    Computational Visual Media, 2022, 8 (03) : 331 - 368
  • [35] Attention mechanisms in computer vision: A survey
    Meng-Hao Guo
    Tian-Xing Xu
    Jiang-Jiang Liu
    Zheng-Ning Liu
    Peng-Tao Jiang
    Tai-Jiang Mu
    Song-Hai Zhang
    Ralph R. Martin
    Ming-Ming Cheng
    Shi-Min Hu
    Computational Visual Media, 2022, 8 : 331 - 368
  • [36] Survey of Transformer Research in Computer Vision
    Li, Xiang
    Zhang, Tao
    Zhang, Zhe
    Wei, Hongyang
    Qian, Yurong
    Computer Engineering and Applications, 2023, 59 (01) : 1 - 14
  • [37] A Survey On Graph Matching In Computer Vision
    Sun, Hui
    Zhou, Wenju
    Fei, Minrui
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 225 - 230
  • [38] Geotagging in multimedia and computer vision—a survey
    Jiebo Luo
    Dhiraj Joshi
    Jie Yu
    Andrew Gallagher
    Multimedia Tools and Applications, 2011, 51 : 187 - 211
  • [39] Context understanding in computer vision: A survey
    Wang, Xuan
    Zhu, Zhigang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [40] Attention mechanisms in computer vision: A survey
    Guo, Meng-Hao
    Xu, Tian-Xing
    Liu, Jiang-Jiang
    Liu, Zheng-Ning
    Jiang, Peng-Tao
    Mu, Tai-Jiang
    Zhang, Song-Hai
    Martin, Ralph R.
    Cheng, Ming-Ming
    Hu, Shi-Min
    COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) : 331 - 368