Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

被引:0
|
作者
Merkle, Philipp [1 ]
Winken, Martin [1 ]
Pfaff, Jonathan [1 ]
Schwarz, Heiko [1 ,2 ]
Marpe, Detlev [1 ]
Wiegand, Thomas [3 ,4 ]
机构
[1] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, Video Commun & Applicat Dept, D-10587 Berlin, Germany
[2] Free Univ Berlin, Inst Comp Sci, D-14195 Berlin, Germany
[3] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, D-10587 Berlin, Germany
[4] Tech Univ Berlin, Dept Telecommun Syst, D-10587 Berlin, Germany
关键词
Convolutional neural networks; Decoding; Standards; Video coding; Image coding; Vectors; Shape; Inter prediction; convolutional neural network; intra reference samples; versatile video coding standard; MOTION COMPENSATION;
D O I
10.1109/TIP.2024.3446228
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.
引用
收藏
页码:4738 / 4752
页数:15
相关论文
共 50 条
  • [1] Adaptive Spatio-Temporal Convolutional Network for Video Deblurring
    Duan, Fengzhi
    Yao, Hongxun
    [J]. IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 777 - 788
  • [2] SPATIO-TEMPORAL PREDICTION IN VIDEO CODING BY BEST APPROXIMATION
    Seiler, Jurgen
    Lakshman, Haricharan
    Kaup, Andre
    [J]. PCS: 2009 PICTURE CODING SYMPOSIUM, 2009, : 81 - 84
  • [3] SPATIO-TEMPORAL PREDICTION IN VIDEO CODING BY BEST APPROXIMATION
    Seiler, Jürgen
    Lakshman, Haricharan
    Kaup, André
    [J]. arXiv, 2022,
  • [4] Spatio-Temporal Spectrum Load Prediction Using Convolutional Neural Network and ResNet
    Ren, Xiangyu
    Mosavat-Jahromi, Hamed
    Cai, Lin
    Kidston, David
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 502 - 513
  • [5] Channel Spatio-Temporal Convolutional Network for Trajectory Prediction
    Lu, Zhonghao
    Xu, Lina
    Hu, Ying
    Sun, Liping
    Luo, Yonglong
    [J]. UBIQUITOUS SECURITY, UBISEC 2023, 2024, 2034 : 205 - 218
  • [6] Adaptive Spatio-Temporal Convolutional Network for Traffic Prediction
    Zhang, Mingyang
    Li, Yong
    Sun, Funing
    Guo, Diansheng
    Hui, Pan
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1475 - 1480
  • [7] Inter frame coding with template matching spatio-temporal prediction
    Sugimoto, K
    Kobayashi, M
    Suzuki, Y
    Kato, S
    Boon, CS
    [J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 465 - 468
  • [8] SPATIO-TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR ELDERLY FALL DETECTION IN DEPTH VIDEO CAMERAS
    Rahnemoonfar, Maryam
    Alkittawi, Hend
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2868 - 2873
  • [9] BLOCK-BASED SPATIO-TEMPORAL PREDICTION FOR VIDEO CODING
    Matsuda, Ichiro
    Unno, Kyohei
    Aomori, Hisashi
    Itoh, Susumu
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 2052 - 2056
  • [10] STCNN: A Spatio-Temporal Convolutional Neural Network for Long-Term Traffic Prediction
    He, Zhixiang
    Chow, Chi-Yin
    Zhang, Jia-Dong
    [J]. 2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 226 - 233