DANet: A spatio-temporal dynamics and Detail Aware Network for video prediction

被引:0
|
作者
Huang, Huilin [1 ]
Guan, YePeng [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China
[2] Minist Educ, Key Lab Adv Display & Syst Applicat, Shanghai 200072, Peoples R China
[3] Shanghai Univ, Key Lab Silicate Cultural Rel Conservat, Minist Educ, Shanghai 200444, Peoples R China
关键词
Video prediction; Spatialtemporal dynamics; Details information; Motion patterns;
D O I
10.1016/j.neucom.2024.128023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video prediction aims to predict the upcoming future frames by modeling the complex spatiotemporal dynamics from given videos. However, most existing video prediction methods still perform sub-optimal in generating high-visual-quality future frames. The reasons behind that are: 1) these methods struggle to reason accurate future motion due to extracting insufficient spatiotemporal correlations from the given frames. 2) The state transition units in the previous works are complex, which inevitably results in the loss of spatial details. When the videos contain variable motion patterns ( e.g. rapid movement of objects) and complex spatial information ( e.g. texture details), blurring artifacts and local absence of objects may occur in the predicted frames. In this work, to predict more accurate future motion and preserve more details information, we propose an end -toend trainable dual-branch video prediction framework, spatiotemporal Dynamics and Detail Aware Network (DANet). Specifically, to predict future motion, we propose a SpatioTemporal Memory (ST-Memory) to learn motion evolution in the temporal domain from the given frames by transmitting the deep features along a zigzag direction. To obtain adequate spatiotemporal correlations among frames, the MotionCell is constructed in the ST-Memory to facilitate the expansion of the receptive field. The spatiotemporal attention is utilized in the ST-Memory to focus on the global variation of given frames. Additionally, to preserve useful spatial details, we design the Spatial Details Memory (SD-Memory) to capture the global and local dependencies of the given frames at the pixel level. Extensive experiments conducted on three public datasets for both synthetic and natural demonstrate that the DANet has excellent performance for video prediction compared with state -ofthe -art methods. In brief, DANet outperforms the state -of -the -art methods in terms of MSE by 3.1, 1.0 x10 -2 and 14.3 x 10 on three public benchmark datasets, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Spatio-Temporal Filter Adaptive Network for Video Deblurring
    Zhou, Shangchen
    Zhang, Jiawei
    Pan, Jinshan
    Xie, Haozhe
    Zuo, Wangmeng
    Ren, Jimmy
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2482 - 2491
  • [22] Spatio-temporal Attention Network for Video Instance Segmentation
    Liu, Xiaoyu
    Ren, Haibing
    Ye, Tingmeng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 725 - 727
  • [23] Adaptive Spatio-Temporal Convolutional Network for Video Deblurring
    Duan, Fengzhi
    Yao, Hongxun
    IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 777 - 788
  • [24] Spatio-Temporal Convolution-Attention Video Network
    Diba, Ali
    Sharma, Vivek
    Arzani, Mohammad. M.
    Van Gool, Luc
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 859 - 869
  • [25] Spatio-Temporal Deformable Attention Network for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Yao, Hongxun
    arXiv, 2022,
  • [26] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [27] VIDEO ANOMALY DETECTION VIA PREDICTION NETWORK WITH ENHANCED SPATIO-TEMPORAL MEMORY EXCHANGE
    Shen, Guodong
    Ouyang, Yuqi
    Sanchez, Victor
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3728 - 3732
  • [28] A mobility aware network traffic prediction model based on dynamic graph attention spatio-temporal network
    Jin, Zilong
    Qian, Jun
    Kong, Zhixiang
    Pan, Chengsheng
    COMPUTER NETWORKS, 2023, 235
  • [29] Spatio-temporal context-aware collaborative QoS prediction
    Zhou, Qimin
    Wu, Hao
    Yue, Kun
    Hsu, Ching-Hsien
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 46 - 57
  • [30] Spatio-Temporal Interaction Aware and Trajectory Distribution Aware Graph Convolution Network for Pedestrian Multimodal Trajectory Prediction
    Wang, Ruiping
    Song, Xiao
    Hu, Zhijian
    Cui, Yong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72