Deep video compression based on Long-range Temporal Context Learning

被引:0
|
作者
Wu, Kejun [1 ]
Li, Zhenxing [1 ]
Yang, You [1 ]
Liu, Qiong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
关键词
Deep learning; Video compression; Computational photography; Temporal context learning;
D O I
10.1016/j.cviu.2024.104127
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video compression allows for efficient storage and transmission of data, benefiting imaging and vision applications, e.g. computational imaging, photography, and displays by delivering high-quality videos. To exploit more informative contexts of video, we propose DVCL, a novel D eep V ideo C ompression based on L ong-range Temporal Context Learning. Aiming at high coding performance, this new compression paradigm makes full use of long-range temporal correlations derived from multiple reference frames to learn richer contexts. Motion vectors (MVs) are estimated to represent the motion relations of videos. By employing MVs, a long-range temporal context learning (LTCL) module is presented to extract context information from multiple reference frames, such that a more accurate and informative temporal contexts can be learned and constructed. The long-range temporal contexts serve as conditions and generate the predicted frames by contextual encoder and decoder. To address the challenge of imbalanced training, we develop a multi-stage training strategy to ensure the whole DVCL framework is trained progressively and stably. Extensive experiments demonstrate the proposed DVCL achieves the highest objective and subjective quality, while maintaining relatively low complexity. Specifically, 25.30% and 45.75% bitrate savings on average can be obtained than x265 codec at the same PSNR and MS-SSIM, respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Action Recognition with Bootstrapping based Long-range Temporal Context Attention
    Liu, Ziming
    Gao, Guangyu
    Qin, A. K.
    Wu, Tong
    Liu, Chi Harold
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 583 - 591
  • [2] Learned Video Compression With Efficient Temporal Context Learning
    Jin, Dengchao
    Lei, Jianjun
    Peng, Bo
    Pan, Zhaoqing
    Li, Li
    Ling, Nam
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3188 - 3198
  • [3] Deep Learning Based Video Compression
    Ji, Kang Da
    Hlavacs, Helmut
    INTELLIGENT TECHNOLOGIES FOR INTERACTIVE ENTERTAINMENT, INTETAIN 2021, 2022, 429 : 127 - 141
  • [4] A Deep Learning Decoder for Long-Range Communication Systems
    Pascual, Damian
    Tanner, Simon
    Vanska, Mickey
    Wattenhofer, Roger
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1668 - 1672
  • [5] Deep learning for software-based turbulence mitigation in long-range imaging
    Nieuwenhuizen, Robert
    Schutte, Klamer
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS, 2019, 11169
  • [6] Long-Term Temporal Context Gathering for Neural Video Compression
    Qi, Linfeng
    Jia, Zhaoyang
    Li, Jiahao
    Li, Bin
    Li, Houqiang
    Lu, Yan
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 305 - 322
  • [7] Learning Long-Range Relationships for Temporal Aircraft Anomaly Detection
    Zhang, Da
    Gao, Junyu
    Li, Xuelong
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (05) : 6385 - 6395
  • [8] LRTD: long-range temporal dependency based active learning for surgical workflow recognition
    Xueying Shi
    Yueming Jin
    Qi Dou
    Pheng-Ann Heng
    International Journal of Computer Assisted Radiology and Surgery, 2020, 15 : 1573 - 1584
  • [9] LRTD: long-range temporal dependency based active learning for surgical workflow recognition
    Shi, Xueying
    Jin, Yueming
    Dou, Qi
    Heng, Pheng-Ann
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2020, 15 (09) : 1573 - 1584
  • [10] Long-Range Spatio-Temporal Modeling of Video with Application to Fire Detection
    Ravichandran, Avinash
    Soatto, Stefano
    COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 329 - 342