Masked Autoencoders in Computer Vision: A Comprehensive Survey

被引：3

作者：

Zhou, Zexian ^{[1
]}

Liu, Xiaojing ^{[1
]}

机构：

[1] Qinghai Univ, Dept Comp Technol & Applicat, Xining 810016, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Computer vision survey; MAE; masked autoencoders; masked image modeling;

D O I：

10.1109/ACCESS.2023.3323383

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.

引用

页码：113560 / 113579

页数：20

共 50 条

[31] A SURVEY OF SENSOR PLANNING IN COMPUTER VISION
TARABANIS, KA
ALLEN, PK
TSAI, RY
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1995, 11 (01): : 86 - 104
[32] Adversarial attacks in computer vision: a survey
Li, Chao
Wang, Handing
Yao, Wen
Jiang, Tingsong
JOURNAL OF MEMBRANE COMPUTING, 2024, 6 (2) : 130 - 147
[33] Prompt learning in computer vision: a survey
Lei, Yiming
Li, Jingqi
Li, Zilong
Cao, Yuan
Shan, Hongming
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 42 - 63
[34] Attention mechanisms in computer vision: A survey
Meng-Hao Guo
Tian-Xing Xu
Jiang-Jiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph R.Martin
Ming-Ming Cheng
Shi-Min Hu
Computational Visual Media, 2022, 8 (03) : 331 - 368
[35] Attention mechanisms in computer vision: A survey
Meng-Hao Guo
Tian-Xing Xu
Jiang-Jiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph R. Martin
Ming-Ming Cheng
Shi-Min Hu
Computational Visual Media, 2022, 8 : 331 - 368
[36] Survey of Transformer Research in Computer Vision
Li, Xiang
Zhang, Tao
Zhang, Zhe
Wei, Hongyang
Qian, Yurong
Computer Engineering and Applications, 2023, 59 (01) : 1 - 14
[37] A Survey On Graph Matching In Computer Vision
Sun, Hui
Zhou, Wenju
Fei, Minrui
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 225 - 230
[38] Geotagging in multimedia and computer vision—a survey
Jiebo Luo
Dhiraj Joshi
Jie Yu
Andrew Gallagher
Multimedia Tools and Applications, 2011, 51 : 187 - 211
[39] Context understanding in computer vision: A survey
Wang, Xuan
Zhu, Zhigang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
[40] Attention mechanisms in computer vision: A survey
Guo, Meng-Hao
Xu, Tian-Xing
Liu, Jiang-Jiang
Liu, Zheng-Ning
Jiang, Peng-Tao
Mu, Tai-Jiang
Zhang, Song-Hai
Martin, Ralph R.
Cheng, Ming-Ming
Hu, Shi-Min
COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) : 331 - 368

← 1 2 3 4 5 →