Masked Autoencoders in Computer Vision: A Comprehensive Survey

被引：3

作者：

Zhou, Zexian ^{[1
]}

Liu, Xiaojing ^{[1
]}

机构：

[1] Qinghai Univ, Dept Comp Technol & Applicat, Xining 810016, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Computer vision survey; MAE; masked autoencoders; masked image modeling;

D O I：

10.1109/ACCESS.2023.3323383

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.

引用

页码：113560 / 113579

页数：20

共 50 条

[1] Masked Autoencoders Are Scalable Vision Learners
He, Kaiming
Chen, Xinlei
Xie, Saining
Li, Yanghao
Dollar, Piotr
Girshick, Ross
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
[2] A Comprehensive Survey of Transformers for Computer Vision
Jamil, Sonain
Piran, Md. Jalil
Kwon, Oh-Jin
DRONES, 2023, 7 (05)
[3] Bootstrapped Masked Autoencoders for Vision BERT Pretraining
Dong, Xiaoyi
Bao, Jianmin
Zhang, Ting
Chen, Dongdong
Zhang, Weiming
Yuan, Lu
Chen, Dong
Wen, Fang
Yu, Nenghai
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 247 - 264
[4] Contrastive Masked Autoencoders are Stronger Vision Learners
Huang, Zhicheng
Jin, Xiaojie
Lu, Chengze
Hou, Qibin
Cheng, Ming-Ming
Fu, Dongmei
Shen, Xiaohui
Feng, Jiashi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2506 - 2517
[5] Deep reinforcement learning in computer vision: a comprehensive survey
Le, Ngan
Rathour, Vidhiwar Singh
Yamazaki, Kashu
Luu, Khoa
Savvides, Marios
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2733 - 2819
[6] Deep reinforcement learning in computer vision: a comprehensive survey
Ngan Le
Vidhiwar Singh Rathour
Kashu Yamazaki
Khoa Luu
Marios Savvides
Artificial Intelligence Review, 2022, 55 : 2733 - 2819
[7] Computer vision based food grain classification: A comprehensive survey
Velesaca, Henry O.
Suarez, Patricia L.
Mira, Raul
Sappa, Angel D.
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 187
[8] A Comprehensive Survey of Indoor Localization Methods Based on Computer Vision
Morar, Anca
Moldoveanu, Alin
Mocanu, Irina
Moldoveanu, Florica
Radoi, Ion Emilian
Asavei, Victor
Gradinaru, Alexandru
Butean, Alex
SENSORS, 2020, 20 (09)
[9] Computer vision-based plants phenotyping: A comprehensive survey
Meraj, Talha
Sharif, Muhammad Imran
Raza, Mudassar
Alabrah, Amerah
Kadry, Seifedine
Gandomi, Amir H.
ISCIENCE, 2024, 27 (01)
[10] Masked Autoencoders that Listen
Huang, Po-Yao
Xu, Hu
Li, Juncheng
Baevski, Alexei
Auli, Michael
Galuba, Wojciech
Metze, Florian
Feichtenhofer, Christoph
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →