End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引：10

作者：

Liu, Haojie ^{[1
]}

Lu, Ming ^{[1
]}

Chen, Zhiqi ^{[2
]}

Cao, Xun ^{[1
]}

Ma, Zhan ^{[1
]}

Wang, Yao ^{[2
]}

机构：

[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China

[2] NYU, Tandon Sch Engn, New York, NY 11201 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;

D O I：

10.1109/TCSVT.2022.3150014

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.

引用

页码：5650 / 5662

页数：13

共 50 条

[21] End-To-End Security for Video Distribution
Boho, Andras
Van Wallendael, Glenn
Dooms, Ann
De Cock, Jan
Braeckman, Geert
Schelkens, Peter
Preneel, Bart
Van de Walle, Rik
IEEE SIGNAL PROCESSING MAGAZINE, 2013, 30 (02) : 97 - 107
[22] Retargeting Video With an End-to-End Framework
Le, Thi-Ngoc-Hanh
Huang, HuiGuang
Chen, Yi-Ru
Lee, Tong-Yee
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6164 - 6176
[23] Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
Zhen, Kai
Lee, Mi Suk
Sung, Jongmo
Beack, Seungkwon
Kim, Minje
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2159 - 2163
[24] End-to-end consensus using end-to-end channels
Wiesmann, Matthias
Defago, Xavier
12TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2006, : 341 - +
[25] End-to-End Distortion Modeling for Error-Resilient Screen Content Video Coding
Tang, Tong
Yin, Zhiyang
Li, Jie
Wang, Honggang
Wu, Dapeng
Wang, Ruyan
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4458 - 4468
[26] A Complete End-To-End Open Source Toolchain for the Versatile Video Coding (VVC) Standard
Wieckowski, Adam
Lehmann, Christian
Bross, Benjamin
Marpe, Detlev
Biatek, Thibaud
Raulet, Mickael
Le Feuvre, Jean
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3795 - 3798
[27] Joint source-channel video coding based on the optimization of end-to-end distortions
Lie, Wen-Nung
Gao, Zhi-Wei
Liu, Tung-Lin
Jui, Ping-Chang
ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2006, 4319 : 842 - +
[28] DBVC: An End-to-End 3-D Deep Biomedical Video Coding Framework
Xue, Dongmei
Ma, Haichuan
Li, Li
Liu, Dong
Xiong, Zhiwei
Li, Houqiang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2922 - 2933
[29] Hybrid end-to-end distortion estimation and its application in error resilient video coding
Wei, Xiaohui
Yang, Hua
Boyce, Jill M.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 837 - +
[30] MPNET: An End-to-End Deep Neural Network for Object Detection in Surveillance Video
Wang, Hanyu
Wang, Ping
Qian, Xueming
IEEE ACCESS, 2018, 6 : 30296 - 30308

← 1 2 3 4 5 →