A Highly Pipelined and Highly Parallel VLSI Architecture of CABAC Encoder for UHDTV Applications

被引:1
|
作者
Fu, Chen [1 ]
Sun, Heming [2 ]
Zhang, Zhiqiang [1 ]
Zhou, Jinjia [1 ]
机构
[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo 1848584, Japan
[2] Waseda Univ, Waseda Res Inst Sci & Engn, Tokyo 1698050, Japan
基金
日本学术振兴会;
关键词
high efficiency video coding (HEVC); entropy coding; context adaptive binary arithmetic coding (CABAC); video coding; hardware design;
D O I
10.3390/s23094293
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and VVC/H.266. CABAC is a well known throughput bottleneck due to its strong data dependencies. Because the required context model of the current bin often depends on the results of the previous bin, the context model cannot be prefetched early enough and then results in pipeline stalls. To solve this problem, we propose a prediction-based context model prefetching strategy, effectively eliminating the clock consumption of the contextual model for accessing data in memory. Moreover, we offer multi-result context model update (MCMU) to reduce the critical path delay of context model updates in multi-bin/clock architecture. Furthermore, we apply pre-range update and pre-renormalize techniques to reduce the multiplex BAE's route delay due to the incomplete reliance on the encoding process. Moreover, to further speed up the processing, we propose to process four regular and several bypass bins in parallel with a variable bypass bin incorporation (VBBI) technique. Finally, a quad-loop cache is developed to improve the compatibility of data interactions between the entropy encoder and other video encoder modules. As a result, the pipeline architecture based on the context model prefetching strategy can remove up to 45.66% of the coding time due to stalls of the regular bin, and the parallel architecture can also save 29.25% of the coding time due to model update on average under the condition that the Quantization Parameter (QP) is equal to 22. At the same time, the throughput of our proposed parallel architecture can reach 2191 Mbin/s, which is sufficient to meet the requirements of 8 K Ultra High Definition Television (UHDTV). Additionally, the hardware efficiency (Mbins/s per k gates) of the proposed architecture is higher than that of existing advanced pipeline and parallel architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Ultra-High-Throughput VLSI Architecture of H.265/HEVC CABAC Encoder for UHDTV Applications
    Zhou, Dajiang
    Zhou, Jinjia
    Fei, Wei
    Goto, Satoshi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (03) : 497 - 507
  • [2] A Highly Pipelined VLSI Architecture for All Modes and Block Sizes Intra Prediction in HEVC Encoder
    Liu, Cong
    Shen, Weiwei
    Ma, Tianlong
    Fan, Yibo
    Zeng, Xiaoyang
    [J]. 2013 IEEE 10TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2013,
  • [3] A Highly Parallel Hardware Architecture of Table-Based CABAC Bit Rate Estimator in an HEVC Intra Encoder
    Zhang, Yuanzhi
    Lu, Chao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (05) : 1544 - 1558
  • [4] A Highly Parallel SAD Architecture for Motion Estimation in HEVC Encoder
    Medhat, Ahmed
    Shalaby, Ahmed
    Sayed, Mohammed S.
    Elsabrouty, Maha
    Mehdipour, Farhad
    [J]. 2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2014, : 280 - 283
  • [5] A Highly Efficient VLSI Architecture for H.264/AVC Level 5.1 CABAC Decoder
    Liao, Yuan-Hsin
    Li, Gwo-Long
    Chang, Tian-Sheuan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (02) : 272 - 281
  • [6] A Novel VLSI DHT Algorithm for a Highly Modular and Parallel Architecture
    Chiper, Doru Florin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2013, 60 (05) : 282 - 286
  • [7] A highly-parallel VLSI architecture for a list sphere detector
    Widdup, B
    Woodward, G
    Knagge, G
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-7, 2004, : 2720 - 2725
  • [8] A scalable highly parallel VLSI architecture dedicated to associative computing algorithms
    Layer, C
    Pfleiderer, HJ
    [J]. 2005 PHD RESEARCH IN MICROELECTRONICS AND ELECTRONICS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 418 - 421
  • [9] A highly-modular pipelined VLSI architecture for 2-D FIR digital filters
    Hsieh, CT
    Kim, SP
    [J]. PROCEEDINGS OF THE 39TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 1996, : 137 - 140
  • [10] A Highly Parallel Joint VLSI Architecture for Transforms in H.264/AVC
    Yu Li
    Yun He
    Shunliang Mei
    [J]. Journal of Signal Processing Systems, 2008, 50 : 19 - 32