Parallelized RDOQ Algorithm and Fully Pipelined Hardware Architecture for AVS3 Video Coding

被引:2
|
作者
Huang, Xiaofeng [1 ,2 ]
Tang, Ran [1 ,3 ]
Pan, Rui [3 ]
Yin, Haibing [1 ,2 ]
Wang, Zhao [4 ]
Wang, Shiqi [5 ]
Ma, Siwei
机构
[1] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China
[2] Peking Univ, Adv Inst Informat Technol, Hangzhou 311215, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Peoples R China
[4] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
[5] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Hardware; Costs; Computer architecture; Quantization (signal); Video coding; Transforms; Estimation; RDOQ; AVS3; zig-zag scanline; parallelized algorithm; hardware architecture; ZERO BLOCK DETECTION; HEVC; QUANTIZATION;
D O I
10.1109/TCSVT.2023.3349278
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bj & oslash;ntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2-K logic gates when working at 471.2-MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.
引用
收藏
页码:6430 / 6444
页数:15
相关论文
共 50 条
  • [31] An Ultralow Complexity String Matching Approach to Screen Content Coding in AVS3
    Yang, Yufen
    Zhou, Kailun
    Zhao, Liping
    Lin, Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (09) : 3714 - 3718
  • [32] A Novel Rate Control Algorithm for AVS Video Coding
    Zhang, Qian
    Fang, Yong
    Wang, Chao
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 2900 - 2902
  • [33] A block motion estimation algorithm for AVS video coding
    Institute of Information Engineering, Xiangtan University, Xiangtan 411105, China
    不详
    Gaojishu Tongxin, 2009, 1 (29-32):
  • [34] A parallelized and pipelined datapath to implement ISODATA algorithm for rosette scan images on a reconfigurable hardware
    Rahimi, Ehsan
    Shokouhi, Shahriar B.
    Sadr, Ali
    GRC: 2007 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, PROCEEDINGS, 2007, : 433 - 436
  • [35] Fast intra coding in AVS3 based on direct non-first pre-coding skip
    Cao, Xueyan
    Lin, Tao
    Zhao, Liping
    Yang, Yufen
    Zhou, Kailun
    Wei, Hu
    Chen, Xianyi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [36] SVT-AVS3: An Open-Source High-Performance AVS3 Encoder With Scalable Video Technology
    Ren, Huiwen
    Wang, Shanshe
    Ma, Siwei
    Gao, Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3291 - 3301
  • [37] Decision Tree Based Early Termination Algorithm for Affine Prediction in AVS3
    Wu, Jiacheng
    Sun, Songlin
    Zhang, Jiaqi
    Han, Xu
    2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
  • [38] Fast CU Partition Decision Algorithm for AVS3 Based on Frequency Domain
    Xu, Chenggang
    Wu, Yi
    Chen, Lei
    Liu, Zihao
    Cao, Ceyao
    2022 21st International Symposium on Communications and Information Technologies, ISCIT 2022, 2022, : 195 - 198
  • [39] GCOTSC: Green Coding Techniques for Online Teaching Screen Content Implemented in AVS3
    Zhao, Liping
    Yan, Zhuge
    Wang, Zehao
    Wang, Xu
    Hu, Keli
    Liu, Huawen
    Lin, Tao
    IEEE TRANSACTIONS ON BROADCASTING, 2024, 70 (01) : 174 - 182
  • [40] Implementation of Pipelined Hardware Architecture for AES Algorithm using FPGA
    Kumar, J. Senthil
    Mahalakshmi, C.
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORK TECHNOLOGIES (ICCNT), 2014, : 260 - 264