DANet: A spatio-temporal dynamics and Detail Aware Network for video prediction

被引:0
|
作者
Huang, Huilin [1 ]
Guan, YePeng [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
[2] Minist Educ, Key Lab Adv Display & Syst Applicat, Shanghai 200072, Peoples R China
[3] Shanghai Univ, Key Lab Silicate Cultural Rel Conservat, Minist Educ, Shanghai 200444, Peoples R China
关键词
Video prediction; Spatialtemporal dynamics; Details information; Motion patterns;
D O I
10.1016/j.neucom.2024.128023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video prediction aims to predict the upcoming future frames by modeling the complex spatiotemporal dynamics from given videos. However, most existing video prediction methods still perform sub-optimal in generating high-visual-quality future frames. The reasons behind that are: 1) these methods struggle to reason accurate future motion due to extracting insufficient spatiotemporal correlations from the given frames. 2) The state transition units in the previous works are complex, which inevitably results in the loss of spatial details. When the videos contain variable motion patterns ( e.g. rapid movement of objects) and complex spatial information ( e.g. texture details), blurring artifacts and local absence of objects may occur in the predicted frames. In this work, to predict more accurate future motion and preserve more details information, we propose an end -toend trainable dual-branch video prediction framework, spatiotemporal Dynamics and Detail Aware Network (DANet). Specifically, to predict future motion, we propose a SpatioTemporal Memory (ST-Memory) to learn motion evolution in the temporal domain from the given frames by transmitting the deep features along a zigzag direction. To obtain adequate spatiotemporal correlations among frames, the MotionCell is constructed in the ST-Memory to facilitate the expansion of the receptive field. The spatiotemporal attention is utilized in the ST-Memory to focus on the global variation of given frames. Additionally, to preserve useful spatial details, we design the Spatial Details Memory (SD-Memory) to capture the global and local dependencies of the given frames at the pixel level. Extensive experiments conducted on three public datasets for both synthetic and natural demonstrate that the DANet has excellent performance for video prediction compared with state -ofthe -art methods. In brief, DANet outperforms the state -of -the -art methods in terms of MSE by 3.1, 1.0 x10 -2 and 14.3 x 10 on three public benchmark datasets, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation
    Zhang, Junchao
    Peng, Yuxin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6209 - 6222
  • [32] Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Yao, Hongxun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2673 - 2681
  • [33] Spatio-temporal Sampling for Video
    Shankar, Mohan
    Pitsiauis, Nikos P.
    Brady, David
    IMAGE RECONSTRUCTION FROM INCOMPLETE DATA V, 2008, 7076
  • [34] Spatio-Temporal Prediction of Suspect Location by Spatio-Temporal Semantics
    Duan L.
    Hu T.
    Zhu X.
    Ye X.
    Wang S.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2019, 44 (05): : 765 - 770
  • [35] Channel Spatio-Temporal Convolutional Network for Trajectory Prediction
    Lu, Zhonghao
    Xu, Lina
    Hu, Ying
    Sun, Liping
    Luo, Yonglong
    UBIQUITOUS SECURITY, UBISEC 2023, 2024, 2034 : 205 - 218
  • [36] BLOCK-BASED SPATIO-TEMPORAL PREDICTION FOR VIDEO CODING
    Matsuda, Ichiro
    Unno, Kyohei
    Aomori, Hisashi
    Itoh, Susumu
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 2052 - 2056
  • [37] STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction
    Ye, Xi
    Bilodeau, Guillaume-Alexandre
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6666 - 6674
  • [38] Adaptive Spatio-Temporal Convolutional Network for Traffic Prediction
    Zhang, Mingyang
    Li, Yong
    Sun, Funing
    Guo, Diansheng
    Hui, Pan
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1475 - 1480
  • [39] SPATIO-TEMPORAL BINARY VIDEO INPAINTING VIA THRESHOLD DYNAMICS
    Oliver, M.
    Palomares, R. P.
    Ballester, C.
    Haro, G.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1822 - 1826
  • [40] A spatio-temporal network for video semantic segmentation in surgical videos
    Maria Grammatikopoulou
    Ricardo Sanchez-Matilla
    Felix Bragman
    David Owen
    Lucy Culshaw
    Karen Kerr
    Danail Stoyanov
    Imanol Luengo
    International Journal of Computer Assisted Radiology and Surgery, 2024, 19 : 375 - 382