Parallelized RDOQ Algorithm and Fully Pipelined Hardware Architecture for AVS3 Video Coding

被引：2

作者：

Huang, Xiaofeng ^{[1
,2
]}

Tang, Ran ^{[1
,3
]}

Pan, Rui ^{[3
]}

Yin, Haibing ^{[1
,2
]}

Wang, Zhao ^{[4
]}

Wang, Shiqi ^{[5
]}

Ma, Siwei

机构：

[1] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China

[2] Peking Univ, Adv Inst Informat Technol, Hangzhou 311215, Peoples R China

[3] Hangzhou Dianzi Univ, Sch Elect & Informat, Hangzhou 310018, Peoples R China

[4] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China

[5] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Hardware; Costs; Computer architecture; Quantization (signal); Video coding; Transforms; Estimation; RDOQ; AVS3; zig-zag scanline; parallelized algorithm; hardware architecture; ZERO BLOCK DETECTION; HEVC; QUANTIZATION;

D O I：

10.1109/TCSVT.2023.3349278

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The rate-distortion optimized quantization (RDOQ) provides significant coding gain in the third generation of Audio Video coding Standard (AVS3). However, the high computational complexity and strong data dependency in RDOQ impede the hardware implementation. To address these issues, we propose a zig-zag scanline-level parallelized RDOQ algorithm and its fully pipelined hardware architecture for AVS3 video coding. For algorithm optimization, we update the run-level context for rate estimation in the inner zig-zag scanline and propose an efficient RD cost calculation form in the optimal coefficient level (OCL) decision step. In the last significant coefficient (LSC) position decision step, a greedy strategy based algorithm is proposed to optimize the determination process in parallel. Moreover, the proposed parallelized RDOQ algorithm is accelerated by single instruction multiple data (SIMD) on the Intel X86 platform. For hardware architecture design, a fully pipelined hardware architecture is proposed with nine pipeline stages. This design can process multiple transform units in parallel when the height is less than 32. Experimental results show that the proposed algorithm achieves 31.37%, 28.58%, and 28.53% time-saving by 0.25%, 0.26%, and 0.27% Bj & oslash;ntegaard delta rate (BD-Rate) increase on average under all intra (AI), random access (RA), and low delay B (LDB) configurations, respectively. The hardware implementation achieves 32 coefficients per cycle, and the area consumption is 1223.2-K logic gates when working at 471.2-MHz. It is proven that the proposed algorithm and hardware architecture design achieve a good trade-off between coding efficiency and hardware throughput.

引用

页码：6430 / 6444

页数：15

共 50 条

[1] A Fast CTU-level SAO Algorithm and Its Hardware Architecture for AVS3 Video Coding
Lin, Hao
Wen, Yingbo
Xiang, Guoqing
Qu, Xinyu
Zhang, Peng
Yan, Wei
Digest of Technical Papers - IEEE International Conference on Consumer Electronics, 2024,
[2] Recent Development of AVS Video Coding Standard: AVS3
Zhang, Jiaqi
Jia, Chuanmin
Lei, Meng
Wang, Shanshe
Ma, Siwei
Gao, Wen
2019 PICTURE CODING SYMPOSIUM (PCS), 2019,
[3] Efficient Fast Algorithm and Parallel Hardware Architecture for Intra Prediction of AVS3
Cai, Zhanyuan
Gao, Wei
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[4] AVS3 coding added to ETSI standards on video
China Standardization, 2023, (05) : 17 - 17
[5] PERFORMANCE EVALUATION FOR AVS3 VIDEO CODING STANDARD
Zheng, Xiaozhen
Liao, Qingmin
Wang, Yueming
Guo, Ze
Wang, Jianglin
Zhou, Yan
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
[6] INTRA BLOCK COPY IN AVS3 VIDEO CODING STANDARD
Wang, Yingbin
Xu, Xiaozhong
Liu, Shan
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
[7] Architecture Design of AVS3 Fractional Motion Estimation for 4K UHD Video Coding
Tong, Sikai
Zeng, Yuning
Xiang, Guoqing
Huang, Xiaofeng
Zhang, Peng
Zhao, Liping
Yan, Wei
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
[8] Scanline-based fast algorithm and pipelined hardware design of rate-distortion optimized quantization for AVS3
Zhao, Jiachen
Yang, Fan
Huang, Xiaofeng
Xiang, Gwoqing
Zhang, Peng
Zhao, Liping
Yan, Wei
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
[9] A 3.1 Gbin/s advanced entropy coding hardware design for AVS3
Cai, Yujie
Li, Wei
Zeng, Xiaoyang
Fan, Yibo
Zhang, Peng
Xiang, Guoqing
Yin, Haibing
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2017 - 2021
[10] An Efficient Real-Time Hardware Architecture for Deblocking Filter in AVS3
Wang, Shaokang
Huang, Xiaofeng
Xiang, Guoqing
Zhu, Xizhong
Yang, Jiaojiao
Zhang, Peng
Jia, Huizhu
Xie, Xiaodong
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2561 - 2566

← 1 2 3 4 5 →