End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引:10
|
作者
Liu, Haojie [1 ]
Lu, Ming [1 ]
Chen, Zhiqi [2 ]
Cao, Xun [1 ]
Ma, Zhan [1 ]
Wang, Yao [2 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China
[2] NYU, Tandon Sch Engn, New York, NY 11201 USA
基金
中国国家自然科学基金;
关键词
Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;
D O I
10.1109/TCSVT.2022.3150014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.
引用
收藏
页码:5650 / 5662
页数:13
相关论文
共 50 条
  • [1] End-to-end Distributed Video Coding
    Zhou, Junwei
    Lv, Ting
    Yi, XiangBo
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 496 - 496
  • [2] An End-to-End No-Reference Video Quality Assessment Method With Hierarchical Spatiotemporal Feature Representation
    Shen, Wenhao
    Zhou, Mingliang
    Liao, Xingran
    Jia, Weijia
    Xiang, Tao
    Fang, Bin
    Shang, Zhaowei
    IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (03) : 651 - 660
  • [3] End-to-end Stereo Audio Coding Using Deep Neural Networks
    Lim, Wootaek
    Jang, Inseon
    Beack, Seungkwon
    Sung, Jongmo
    Lee, Taejin
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 860 - 864
  • [4] End-to-End Learning of Motion Representation for Video Understanding
    Fan, Lijie
    Huang, Wenbing
    Gan, Chuang
    Ermon, Stefano
    Gong, Boqing
    Huang, Junzhou
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6016 - 6025
  • [5] Learning-based End-to-End Video Compression Using Predictive Coding
    de Oliveira, Matheus C.
    Martins, Luiz G. R.
    Jung, Henrique Costa
    Guerin Jr, Nilson Donizete
    da Silva, Renam Castro
    Peixoto, Eduardo
    Macchiavello, Bruno
    Hung, Edson M.
    Testoni, Vanessa
    Freitas, Pedro Garcia
    2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 160 - 167
  • [6] END-TO-END OPTIMIZED SPEECH CODING WITH DEEP NEURAL NETWORKS
    Kankanahalli, Srihari
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2521 - 2525
  • [7] Optimum end-to-end distortion estimation for error resilient video coding
    Zhang, Y
    Huang, QM
    Lu, Y
    Gao, W
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 2, PROCEEDINGS, 2004, 3332 : 513 - 520
  • [8] Optimum end-to-end distortion estimation for error resilient video coding
    Zhang, Yuan
    Huang, Qingming
    Lu, Yan
    Gao, Wen
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3332 : 513 - 520
  • [9] An End-to-End Video Steganography Network Based on a Coding Unit Mask
    Chai, Huanhuan
    Li, Zhaohong
    Li, Fan
    Zhang, Zhenzhen
    ELECTRONICS, 2022, 11 (07)
  • [10] An End-to-End Video Coding Method via Adaptive Vision Transformer
    Yang, Haoyan
    Zhou, Mingliang
    Shang, Zhaowei
    Pu, Huayan
    Luo, Jun
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Wei, Xuekai
    Xian, Weizhi
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)