End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional Video Compression

被引:25
|
作者
Yilmaz, M. Akin [1 ]
Tekalp, A. Murat [1 ]
机构
[1] Koc Univ, Dept Elect & Elect Engn, TR-34450 Istanbul, Turkey
关键词
Bidirectional control; Image coding; Video compression; Motion compensation; Optimization; Entropy; Video codecs; Learned video compression; learned bi-directional motion compensation; flow field sub-sampling; flow vector prediction; end-to-end optimization;
D O I
10.1109/TIP.2021.3138300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously. Most works on learned VC consider end-to-end optimization of a sequential video codec based on R-D loss averaged over pairs of successive frames. It is well-known in conventional VC that hierarchical, bi-directional coding outperforms sequential compression because of its ability to use both past and future reference frames. This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that we achieve the best R-D results that are reported for learned VC schemes to date in both PSNR and MS-SSIM. Compared to conventional video codecs, the R-D performance of our end-to-end optimized codec outperforms those of both x265 and SVT-HEVC encoders ("veryslow" preset) in PSNR and MS-SSIM as well as HM 16.23 reference software in MS-SSIM. We present ablation studies showing performance gains due to proposed novel tools such as learned masking, flow-field subsampling, and temporal flow vector prediction. The models and instructions to reproduce our results can be found in https://github.com/makinyilmaz/LHBDC/.
引用
收藏
页码:974 / 983
页数:10
相关论文
共 50 条
  • [1] End-to-End Rate-Distortion Optimization for Bi-Directional Learned Video Compression
    Yilmaz, M. Akin
    Tekalp, A. Murat
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1311 - 1315
  • [2] Bi-directional prediction for end-to-end optimized video compression
    Racape, Fabien
    Begaint, Jean
    Feltman, Simon
    Pushparaja, Akshay
    [J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XLIV, 2021, 11842
  • [3] End-to-end rate-distortion optimized motion estimation
    Wan, Shuai
    Izquierdo, Ebroul
    Yang, Fuzheng
    Chang, Yilin
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 809 - +
  • [4] End-to-end rate-distortion optimized mode selection for multiple description video coding
    Heng, BA
    Apostolopoulos, JG
    Lim, AS
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 905 - 908
  • [5] Error-resilient video coding with end-to-end rate-distortion optimized at macroblock level
    Xiao, Jimin
    Tillo, Tammam
    Lin, Chunyu
    Zhao, Yao
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [6] End-to-end rate-distortion optimized MD mode selection for multiple description video coding
    Heng, Brian A.
    Apostolopoulos, John G.
    Lim, Jae S.
    [J]. Eurasip Journal on Applied Signal Processing, 2006, 2006 : 1 - 12
  • [7] Error-resilient video coding with end-to-end rate-distortion optimized at macroblock level
    Jimin Xiao
    Tammam Tillo
    Chunyu Lin
    Yao Zhao
    [J]. EURASIP Journal on Advances in Signal Processing, 2011
  • [8] End-to-end rate-distortion optimized MD mode selection for multiple description video coding
    Heng, Brian A.
    Apostolopoulos, John G.
    Lim, Jae S.
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1) : 1 - 12
  • [9] End-to-End Rate-Distortion Optimized MD Mode Selection for Multiple Description Video Coding
    Brian A Heng
    John G Apostolopoulos
    Jae S Lim
    [J]. EURASIP Journal on Advances in Signal Processing, 2006
  • [10] End-to-End Rate-Distortion Optimized Description Generation for H.264 Multiple Description Video Coding
    Xu, Yuanyuan
    Zhu, Ce
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2013, 23 (09) : 1523 - 1536