Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

被引：0

作者：

Merkle, Philipp ^{[1
]}

Winken, Martin ^{[1
]}

Pfaff, Jonathan ^{[1
]}

Schwarz, Heiko ^{[1
,2
]}

Marpe, Detlev ^{[1
]}

Wiegand, Thomas ^{[3
,4
]}

机构：

[1] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, Video Commun & Applicat Dept, D-10587 Berlin, Germany

[2] Free Univ Berlin, Inst Comp Sci, D-14195 Berlin, Germany

[3] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, D-10587 Berlin, Germany

[4] Tech Univ Berlin, Dept Telecommun Syst, D-10587 Berlin, Germany

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

关键词：

Convolutional neural networks; Decoding; Standards; Video coding; Image coding; Vectors; Shape; Inter prediction; convolutional neural network; intra reference samples; versatile video coding standard; MOTION COMPENSATION;

D O I：

10.1109/TIP.2024.3446228

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.

引用

页码：4738 / 4752

页数：15

共 50 条

[21] Multiple selection approximation for improved spatio-temporal prediction in video coding
Department of Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr. 7, 91058 Erlangen, Germany
[J]. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, 2010, (886-889):
[22] MULTIPLE SELECTION APPROXIMATION FOR IMPROVED SPATIO-TEMPORAL PREDICTION IN VIDEO CODING
Seiler, Jürgen
Kaup, André
[J]. arXiv, 2022,
[23] Spatio-temporal prediction and reconstruction network for video anomaly detection
Liu, Ting
Zhang, Chengqing
Niu, Xiaodong
Wang, Liming
[J]. PLOS ONE, 2022, 17 (05):
[24] INTRA-INTER PREDICTION FOR VERSATILE VIDEO CODING USING A RESIDUAL CONVOLUTIONAL NEURAL NETWORK
Merkle, Philipp
Winken, Martin
Pfaff, Jonathan
Schwarz, Heiko
Marpe, Detlev
Wiegand, Thomas
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1711 - 1715
[25] MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction
Haifeng Sang
Wangxing Chen
Haifeng Wang
Jinyu Wang
[J]. Multimedia Tools and Applications, 2024, 83 : 8533 - 8550
[26] MSTCNN: multi-modal spatio-temporal convolutional neural network for pedestrian trajectory prediction
Sang, Haifeng
Chen, Wangxing
Wang, Haifeng
Wang, Jinyu
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8533 - 8550
[27] Unsupervised Video Prediction Network with Spatio-temporal Deep Features
Jin, Beibei
Zhou, Rong
Zhang, Zhisheng
Dai, Min
[J]. PROCEEDINGS OF THE 2018 25TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2018, : 19 - 24
[28] Spatio-Temporal Crime Prediction with Temporally Hierarchical Convolutional Neural Networks
Ilhan, Fatih
Tekin, Selim F.
Aksoy, Bilgin
[J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[29] Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network
Kim, Woojae
Kim, Jongyoo
Ahn, Sewoong
Kim, Jinwoo
Lee, Sanghoon
[J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 224 - 241
[30] Smoke Screen Video Detection and Parameter Extraction Based on Convolutional Neural Network and Spatio-temporal Features
Guo, Aiqiang
Li, Tianpeng
Zhu, Xi
Guan, Zhichao
Li, Men
Dong, Hongyu
Gao, Xinbao
[J]. Binggong Xuebao/Acta Armamentarii, 2024, 45 (08): : 2478 - 2486

← 1 2 3 4 5 →