End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引:10
|
作者
Liu, Haojie [1 ]
Lu, Ming [1 ]
Chen, Zhiqi [2 ]
Cao, Xun [1 ]
Ma, Zhan [1 ]
Wang, Yao [2 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China
[2] NYU, Tandon Sch Engn, New York, NY 11201 USA
基金
中国国家自然科学基金;
关键词
Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;
D O I
10.1109/TCSVT.2022.3150014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.
引用
收藏
页码:5650 / 5662
页数:13
相关论文
共 50 条
  • [21] End-To-End Security for Video Distribution
    Boho, Andras
    Van Wallendael, Glenn
    Dooms, Ann
    De Cock, Jan
    Braeckman, Geert
    Schelkens, Peter
    Preneel, Bart
    Van de Walle, Rik
    IEEE SIGNAL PROCESSING MAGAZINE, 2013, 30 (02) : 97 - 107
  • [22] Retargeting Video With an End-to-End Framework
    Le, Thi-Ngoc-Hanh
    Huang, HuiGuang
    Chen, Yi-Ru
    Lee, Tong-Yee
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6164 - 6176
  • [23] Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
    Zhen, Kai
    Lee, Mi Suk
    Sung, Jongmo
    Beack, Seungkwon
    Kim, Minje
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2159 - 2163
  • [24] End-to-end consensus using end-to-end channels
    Wiesmann, Matthias
    Defago, Xavier
    12TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2006, : 341 - +
  • [25] End-to-End Distortion Modeling for Error-Resilient Screen Content Video Coding
    Tang, Tong
    Yin, Zhiyang
    Li, Jie
    Wang, Honggang
    Wu, Dapeng
    Wang, Ruyan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4458 - 4468
  • [26] A Complete End-To-End Open Source Toolchain for the Versatile Video Coding (VVC) Standard
    Wieckowski, Adam
    Lehmann, Christian
    Bross, Benjamin
    Marpe, Detlev
    Biatek, Thibaud
    Raulet, Mickael
    Le Feuvre, Jean
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3795 - 3798
  • [27] Joint source-channel video coding based on the optimization of end-to-end distortions
    Lie, Wen-Nung
    Gao, Zhi-Wei
    Liu, Tung-Lin
    Jui, Ping-Chang
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2006, 4319 : 842 - +
  • [28] DBVC: An End-to-End 3-D Deep Biomedical Video Coding Framework
    Xue, Dongmei
    Ma, Haichuan
    Li, Li
    Liu, Dong
    Xiong, Zhiwei
    Li, Houqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2922 - 2933
  • [29] Hybrid end-to-end distortion estimation and its application in error resilient video coding
    Wei, Xiaohui
    Yang, Hua
    Boyce, Jill M.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 837 - +
  • [30] MPNET: An End-to-End Deep Neural Network for Object Detection in Surveillance Video
    Wang, Hanyu
    Wang, Ping
    Qian, Xueming
    IEEE ACCESS, 2018, 6 : 30296 - 30308