A Survey on Deep Predictive Learning Based on Unlabeled Videos

被引:0
|
作者
Pan M.-T. [1 ]
Wang Y.-B. [1 ]
Zhu X.-M. [1 ]
Gao S.-Y. [1 ]
Long M.-S. [2 ]
Yang X.-K. [1 ]
机构
[1] MoE Key laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai
[2] School of Software, Tsinghua University, Beijing
来源
关键词
Computer vision; Deep learning; Model-based visual planning; Self-supervised learning; Video prediction;
D O I
10.12263/DZXB.20211209
中图分类号
学科分类号
摘要
Deep predictive learning based on video data (hereinafter referred to as "deep predictive learning") is a research direction of deep learning, being interacted with computer vision and reinforcement learning. It is a key part of intelligent prediction and decision-making systems in weather forecasting, autonomous driving, robotics, and other scenarios, and has become a hot research field of machine learning in recent years. Deep predictive learning follows the self-supervised learning paradigm, using internal constraints from unlabeled video data to learn the underlying spatiotemporal patterns. In this paper, we review the existing deep learning techniques for predictive learning in detail. First, we summarize the research scope and application fields of deep predictive learning. Second, we present the datasets and evaluation metrics commonly used in this research field. Third, we summarize current mainstream deep prediction learning models from three perspectives: predictive models based on observation space, predictive models based on state space, and visual planning methods based on the predictive models. Finally, we discuss the hot issues and future research directions in the field of deep predictive learning. © 2022, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:869 / 886
页数:17
相关论文
共 123 条
  • [1] SHI X J, CHEN Z R, WANG H, Et al., Convolutional LSTM network: A machine learning approach for precipitation nowcasting, Proceedings of The Advances in Neural Information Processing Systems, pp. 802-810, (2015)
  • [2] CHANDRA R, BHATTACHARYA U, BERA A, Et al., Traphic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions, Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8483-8492, (2019)
  • [3] CASTREJON L, BALLAS N, COURVILLE A., Improved conditional vrnns for video prediction, Proceedings of The IEEE/CVF International Conference on Computer Vision, pp. 7608-7617, (2019)
  • [4] ZHANG J, ZHENG Y, QI D, Et al., DNN-based prediction model for spatio-temporal data, Proceedings of The 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 1-4, (2016)
  • [5] EBERT F, FINN C, LEE A X, Et al., Self-supervised visual planning with temporal skip connections, Proceedings of The 1st Annual Conference on Robot Learning, pp. 344-356, (2017)
  • [6] HA D, SCHMIDHUBER J., World models
  • [7] HAFNER D, LILLICRAP T, BA J, Et al., Dream to control: Learning behaviors by latent imagination
  • [8] WANG Y, LIU B, WU J, Et al., DualSMC: Tunneling differentiable filtering and planningunder continuous POMDPs, Proceedings of The TwentyNinth International Joint Conference on Artificial Intelligence, pp. 4190-4198, (2020)
  • [9] LECUN Y, BOTTOU L., Gradient-based learning applied to document recognition, Proceedings of The IEEE, 86, 11, pp. 2278-2324, (1998)
  • [10] JAIN V, MURRAY J F, ROTH F, Et al., Super-vised learning of image restoration with convolu-tional networks, Proceedings of The 11th International Conference on Computer Vision, pp. 1-8, (2007)