Parallelized RDOQ Algorithm and Fully Pipelined Hardware Architecture for AVS3 Video Coding

被引:2
|
作者
Huang, Xiaofeng [1 ,2 ]
Tang, Ran [1 ,3 ]
Pan, Rui [3 ]
Yin, Haibing [1 ,2 ]
Wang, Zhao [4 ]
Wang, Shiqi [5 ]
Ma, Siwei
机构
[1] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China
[2] Peking Univ, Adv Inst Informat Technol, Hangzhou 311215, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Peoples R China
[4] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
[5] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Hardware; Costs; Computer architecture; Quantization (signal); Video coding; Transforms; Estimation; RDOQ; AVS3; zig-zag scanline; parallelized algorithm; hardware architecture; ZERO BLOCK DETECTION; HEVC; QUANTIZATION;
D O I
10.1109/TCSVT.2023.3349278
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bj & oslash;ntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2-K logic gates when working at 471.2-MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.
引用
收藏
页码:6430 / 6444
页数:15
相关论文
共 50 条
  • [41] Intra Prediction Fast Algorithm in AVS3 based on Image Texture Characteristics
    Wang, Yizhao
    Zhang, Chaobo
    Sun, Songlin
    PROCEEDINGS OF ISCIT 2021: 2021 20TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2021, : 6 - 10
  • [42] Multi-Level Pipelined Parallel Hardware Architecture for High Throughput Motion and Disparity Estimation in Multiview Video Coding
    Zatt, Bruno
    Shafique, Muhammad
    Bampi, Sergio
    Henkel, Joerg
    2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 1448 - 1453
  • [43] AN ARCHITECTURE AND AN ALGORITHM FOR FULLY DIGITAL CORRECTION OF MONOLITHIC PIPELINED ADCS
    SOENEN, EG
    GEIGER, RL
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1995, 42 (03): : 143 - 153
  • [44] A Fully Pipelined Architecture for the LOCO-I Compression Algorithm
    Merlino, Pierantonio
    Abramo, Antonio
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (07) : 967 - 971
  • [45] Scalable Fully Pipelined Hardware Architecture for In-Network Aggregated AllReduce Communication
    Liu, Yao
    Zhang, Junyi
    Liu, Shuo
    Wang, Qiaoling
    Dai, Wangchen
    Cheung, Ray Chak Chung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (10) : 4194 - 4206
  • [46] An efficient all-zero block detection algorithm for high efficiency video coding with RDOQ
    Yin, Haibing
    Cai, Hao
    Yang, Enhui
    Zhou, Yang
    Wu, Jiao
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 60 : 79 - 90
  • [47] High Throughput CBAC Hardware Encoder With Bin Merging for AVS 2.0 Video Coding
    Choi, Young-Kyu
    Lee, Hyuk-Jae
    Chae, Soo-Ik
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4439 - 4453
  • [48] A Real-Time Ultra-High Definition Video Decoder of AVS3 on Heterogeneous Systems
    Han, Xu
    Pan, Xiaofei
    Wang, Shiqi
    Wang, Shanshe
    Gao, Wen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5595 - 5607
  • [49] Hardware Friendly Mode Decision Algorithm for High Definition AVS Video Encoder
    Yin, Hai bing
    Wang, Xiao Han
    Zhu, Xiang Kui
    Qi, Honggang
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 227 - +
  • [50] Improved FFSBM Algorithm and Its VLSI Architecture for AVS Video Standard
    Li Zhang
    Don Xie
    Di Wu
    Journal of Computer Science and Technology, 2006, 21 : 378 - 382