Hybrid video coding scheme based on VVC and spatio-temporal attention convolution neural network

被引:0
|
作者
He, Gang [1 ]
Xu, Kepeng [1 ]
Wu, Chang [1 ]
Ma, Zijia [1 ]
Wen, Xing [2 ]
Sun, Ming [2 ]
机构
[1] Xidian Univ, Xian, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
D O I
10.1109/CVPRW56347.2022.00193
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we propose a hybrid video coding framework. The framework is built on the basis of VVC (Versatile Video Coding) video coding standard and constructs an implicitly aligned multi-frame fusion model to accomplish subjective video quality enhancement. The proposed framework mainly optimizes video compression efficiency from two perspectives. First is the sequence-level dynamic rate control algorithm, which assigns the appropriate bitrate to each video to obtain the highest overall video quality. Second is the MAQE, a multi frame implicit alignment video quality enhancement model, which performs motion alignment through multiple convolutional kernels of different sizes, uses a residual aggregation layer to fuse features of different frames, and then uses an enhanced attention module to adaptively deflate features based on spatio-temporal contextual features, so as to more effectively fuse feature of multiple frames and obtain higher quality reconstructed frames. The proposed method is validated on two tracks of 0.1M code rate and 1M code rate on CLIC-2022 video compression task, Experimental results show that the proposed method achieves PSNR of 30.301 and 37.251 and obtains MS-SSIM of 0.9368 and 0.9875. This paper is a comprehensive presentation of the scheme used by the Night-Watch team of the CLIC-2022 video track.
引用
收藏
页码:1790 / 1793
页数:4
相关论文
共 50 条
  • [1] Spatio-Temporal Convolution-Attention Video Network
    Diba, Ali
    Sharma, Vivek
    Arzani, Mohammad. M.
    Van Gool, Luc
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 859 - 869
  • [2] Spatio-temporal rate allocation for hybrid video coding
    Beermann, M
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 222 - 230
  • [3] Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding
    Merkle, Philipp
    Winken, Martin
    Pfaff, Jonathan
    Schwarz, Heiko
    Marpe, Detlev
    Wiegand, Thomas
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4738 - 4752
  • [4] Lightweight video super-resolution based on hybrid spatio-temporal convolution
    Xia, Zhenping
    Chen, Hao
    Zhang, Yuning
    Cheng, Cheng
    Hu, Fuyuan
    [J]. Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2024, 32 (16): : 2564 - 2576
  • [5] Spatio-Temporal Deformable Attention Network for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Yao, Hongxun
    [J]. COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 581 - 596
  • [6] Spatio-temporal Attention Network for Video Instance Segmentation
    Liu, Xiaoyu
    Ren, Haibing
    Ye, Tingmeng
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 725 - 727
  • [7] Spatio-Temporal Deformable Attention Network for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Yao, Hongxun
    [J]. arXiv, 2022,
  • [8] 3DCANN: A Spatio-Temporal Convolution Attention Neural Network for EEG Emotion Recognition
    Liu, Shuaiqi
    Wang, Xu
    Zhao, Ling
    Li, Bing
    Hu, Weiming
    Yu, Jie
    Zhang, Yu-Dong
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5321 - 5331
  • [9] Video Fingerprint Algorithm Based on Spatio-Temporal Deep Neural Network
    Wang Dongdong
    Li Yuenan
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (01)
  • [10] Shared Spatio-temporal Attention Convolution Optimization Network for Traffic Prediction
    Li, Pengcheng
    Ke, Changjiu
    Tu, Hongyu
    Zhang, Houbing
    Zhang, Xu
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2023, 19 (01): : 130 - 138