Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

被引:0
|
作者
Merkle, Philipp [1 ]
Winken, Martin [1 ]
Pfaff, Jonathan [1 ]
Schwarz, Heiko [1 ,2 ]
Marpe, Detlev [1 ]
Wiegand, Thomas [3 ,4 ]
机构
[1] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, Video Commun & Applicat Dept, D-10587 Berlin, Germany
[2] Free Univ Berlin, Inst Comp Sci, D-14195 Berlin, Germany
[3] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, D-10587 Berlin, Germany
[4] Tech Univ Berlin, Dept Telecommun Syst, D-10587 Berlin, Germany
关键词
Convolutional neural networks; Decoding; Standards; Video coding; Image coding; Vectors; Shape; Inter prediction; convolutional neural network; intra reference samples; versatile video coding standard; MOTION COMPENSATION;
D O I
10.1109/TIP.2024.3446228
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.
引用
收藏
页码:4738 / 4752
页数:15
相关论文
共 50 条
  • [21] Multiple selection approximation for improved spatio-temporal prediction in video coding
    Department of Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr. 7, 91058 Erlangen, Germany
    [J]. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2010, (886-889):
  • [22] MULTIPLE SELECTION APPROXIMATION FOR IMPROVED SPATIO-TEMPORAL PREDICTION IN VIDEO CODING
    Seiler, Jürgen
    Kaup, André
    [J]. arXiv, 2022,
  • [23] Spatio-temporal prediction and reconstruction network for video anomaly detection
    Liu, Ting
    Zhang, Chengqing
    Niu, Xiaodong
    Wang, Liming
    [J]. PLOS ONE, 2022, 17 (05):
  • [24] INTRA-INTER PREDICTION FOR VERSATILE VIDEO CODING USING A RESIDUAL CONVOLUTIONAL NEURAL NETWORK
    Merkle, Philipp
    Winken, Martin
    Pfaff, Jonathan
    Schwarz, Heiko
    Marpe, Detlev
    Wiegand, Thomas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1711 - 1715
  • [25] MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction
    Haifeng Sang
    Wangxing Chen
    Haifeng Wang
    Jinyu Wang
    [J]. Multimedia Tools and Applications, 2024, 83 : 8533 - 8550
  • [26] MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction
    Sang, Haifeng
    Chen, Wangxing
    Wang, Haifeng
    Wang, Jinyu
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8533 - 8550
  • [27] Unsupervised Video Prediction Network with Spatio-temporal Deep Features
    Jin, Beibei
    Zhou, Rong
    Zhang, Zhisheng
    Dai, Min
    [J]. PROCEEDINGS OF THE 2018 25TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2018, : 19 - 24
  • [28] Spatio-Temporal Crime Prediction with Temporally Hierarchical Convolutional Neural Networks
    Ilhan, Fatih
    Tekin, Selim F.
    Aksoy, Bilgin
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [29] Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network
    Kim, Woojae
    Kim, Jongyoo
    Ahn, Sewoong
    Kim, Jinwoo
    Lee, Sanghoon
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 224 - 241
  • [30] Smoke Screen Video Detection and Parameter Extraction Based on Convolutional Neural Network and Spatio-temporal Features
    Guo, Aiqiang
    Li, Tianpeng
    Zhu, Xi
    Guan, Zhichao
    Li, Men
    Dong, Hongyu
    Gao, Xinbao
    [J]. Binggong Xuebao/Acta Armamentarii, 2024, 45 (08): : 2478 - 2486