DANet: A spatio-temporal dynamics and Detail Aware Network for video prediction

被引:0
|
作者
Huang, Huilin [1 ]
Guan, YePeng [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
[2] Minist Educ, Key Lab Adv Display & Syst Applicat, Shanghai 200072, Peoples R China
[3] Shanghai Univ, Key Lab Silicate Cultural Rel Conservat, Minist Educ, Shanghai 200444, Peoples R China
关键词
Video prediction; Spatialtemporal dynamics; Details information; Motion patterns;
D O I
10.1016/j.neucom.2024.128023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video prediction aims to predict the upcoming future frames by modeling the complex spatiotemporal dynamics from given videos. However, most existing video prediction methods still perform sub-optimal in generating high-visual-quality future frames. The reasons behind that are: 1) these methods struggle to reason accurate future motion due to extracting insufficient spatiotemporal correlations from the given frames. 2) The state transition units in the previous works are complex, which inevitably results in the loss of spatial details. When the videos contain variable motion patterns ( e.g. rapid movement of objects) and complex spatial information ( e.g. texture details), blurring artifacts and local absence of objects may occur in the predicted frames. In this work, to predict more accurate future motion and preserve more details information, we propose an end -toend trainable dual-branch video prediction framework, spatiotemporal Dynamics and Detail Aware Network (DANet). Specifically, to predict future motion, we propose a SpatioTemporal Memory (ST-Memory) to learn motion evolution in the temporal domain from the given frames by transmitting the deep features along a zigzag direction. To obtain adequate spatiotemporal correlations among frames, the MotionCell is constructed in the ST-Memory to facilitate the expansion of the receptive field. The spatiotemporal attention is utilized in the ST-Memory to focus on the global variation of given frames. Additionally, to preserve useful spatial details, we design the Spatial Details Memory (SD-Memory) to capture the global and local dependencies of the given frames at the pixel level. Extensive experiments conducted on three public datasets for both synthetic and natural demonstrate that the DANet has excellent performance for video prediction compared with state -ofthe -art methods. In brief, DANet outperforms the state -of -the -art methods in terms of MSE by 3.1, 1.0 x10 -2 and 14.3 x 10 on three public benchmark datasets, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Spatio-temporal prediction and reconstruction network for video anomaly detection
    Liu, Ting
    Zhang, Chengqing
    Niu, Xiaodong
    Wang, Liming
    PLOS ONE, 2022, 17 (05):
  • [2] Unsupervised Video Prediction Network with Spatio-temporal Deep Features
    Jin, Beibei
    Zhou, Rong
    Zhang, Zhisheng
    Dai, Min
    PROCEEDINGS OF THE 2018 25TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2018, : 19 - 24
  • [3] A Frequency-Aware Spatio-Temporal Network for Traffic Flow Prediction
    Peng, Shunfeng
    Shen, Yanyan
    Zhu, Yanmin
    Chen, Yuting
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 697 - 712
  • [4] Spatio-Temporal Self-Attention Network for Video Saliency Prediction
    Wang, Ziqiang
    Liu, Zhi
    Li, Gongyang
    Wang, Yang
    Zhang, Tianhong
    Xu, Lihua
    Wang, Jijun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1161 - 1174
  • [5] Exploring the Spatio-Temporal Aware Graph for video captioning
    Xue, Ping
    Zhou, Bing
    IET COMPUTER VISION, 2022, 16 (05) : 456 - 467
  • [6] Flexible Spatio-Temporal Networks for Video Prediction
    Lu, Chaochao
    Hirsch, Michael
    Scholkopf, Bernhard
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2137 - 2145
  • [7] Spatio-Temporal Transformer Network for Video Restoration
    Kim, Tae Hyun
    Sajjadi, Mehdi S. M.
    Hirsch, Michael
    Schoelkopf, Bernhard
    COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
  • [8] Spatio-Temporal Detail Information Retrieval for Compressed Video Quality Enhancement
    Luo, Dengyan
    Ye, Mao
    Li, Shuai
    Zhu, Ce
    Li, Xue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6808 - 6820
  • [9] Dast-Net: Depth-Aware Spatio-Temporal Network for Video Deblurring
    Zhu, Qi
    Xiao, Zeyu
    Huang, Jie
    Zhao, Feng
    Proceedings - IEEE International Conference on Multimedia and Expo, 2022, 2022-July
  • [10] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Xuemin Lu
    Wei Quan
    Reformat Marek
    Haiquan Zhao
    Jim X. Chen
    The Visual Computer, 2024, 40 : 3163 - 3181