A Highly Pipelined and Highly Parallel VLSI Architecture of CABAC Encoder for UHDTV Applications

被引:1
|
作者
Fu, Chen [1 ]
Sun, Heming [2 ]
Zhang, Zhiqiang [1 ]
Zhou, Jinjia [1 ]
机构
[1] Hosei Univ, Grad Sch Sci & Engn, Tokyo 1848584, Japan
[2] Waseda Univ, Waseda Res Inst Sci & Engn, Tokyo 1698050, Japan
基金
日本学术振兴会;
关键词
high efficiency video coding (HEVC); entropy coding; context adaptive binary arithmetic coding (CABAC); video coding; hardware design;
D O I
10.3390/s23094293
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and VVC/H.266. CABAC is a well known throughput bottleneck due to its strong data dependencies. Because the required context model of the current bin often depends on the results of the previous bin, the context model cannot be prefetched early enough and then results in pipeline stalls. To solve this problem, we propose a prediction-based context model prefetching strategy, effectively eliminating the clock consumption of the contextual model for accessing data in memory. Moreover, we offer multi-result context model update (MCMU) to reduce the critical path delay of context model updates in multi-bin/clock architecture. Furthermore, we apply pre-range update and pre-renormalize techniques to reduce the multiplex BAE's route delay due to the incomplete reliance on the encoding process. Moreover, to further speed up the processing, we propose to process four regular and several bypass bins in parallel with a variable bypass bin incorporation (VBBI) technique. Finally, a quad-loop cache is developed to improve the compatibility of data interactions between the entropy encoder and other video encoder modules. As a result, the pipeline architecture based on the context model prefetching strategy can remove up to 45.66% of the coding time due to stalls of the regular bin, and the parallel architecture can also save 29.25% of the coding time due to model update on average under the condition that the Quantization Parameter (QP) is equal to 22. At the same time, the throughput of our proposed parallel architecture can reach 2191 Mbin/s, which is sufficient to meet the requirements of 8 K Ultra High Definition Television (UHDTV). Additionally, the hardware efficiency (Mbins/s per k gates) of the proposed architecture is higher than that of existing advanced pipeline and parallel architectures.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] A HIGHLY EFFICIENT EXTERNAL MEMORY INTERFACE ARCHITECTURE FOR AVS HD VIDEO ENCODER
    Huang, Xiaofeng
    Zhu, Chuang
    Zhang, Lei
    Wei, Kaijin
    Jia, Huizhu
    Xie, Don
    Gao, Wen
    [J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [42] A highly efficient VLSI architecture for H.264/AVC CAVLC decoder
    Lin, Heng-Yao
    Lu, Ying-Hong
    Liu, Bin-Da
    Yang, Jar-Ferr
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (01) : 31 - 42
  • [43] The VLSI Architecture of a Highly Efficient Configurable Pre-processor for MIMO Detections
    Tseng, Tzu-Ting
    Shen, Chung-An
    [J]. 2017 IEEE 36TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2017,
  • [44] A PROGRAMMABLE HIGHLY PARALLEL ARCHITECTURE FOR DIGITAL SIGNAL-PROCESSING
    MAZARE, G
    PAYAN, E
    [J]. 1989 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 1989, : 1332 - 1335
  • [45] A Highly Parallel FPGA-based Evolvable Hardware Architecture
    Cancare, Fabio
    Castagna, Marco
    Renesto, Matteo
    Sciuto, Donatella
    [J]. PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 608 - 615
  • [46] Highly parallel GEMV with register blocking method on GPU architecture
    Yin, Jian
    Yu, Hui
    Xu, Weizhi
    Wang, Yuxuan
    Tian, Zhu
    Zhang, Yingping
    Chen, Bochuan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (07) : 1566 - 1573
  • [47] Highly parallel online bioelectrical signal processing on GPU architecture
    Juhasz, Z.
    [J]. 2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 340 - 346
  • [48] A Highly Parallel Implementation of K-Means for Multithreaded Architecture
    Mackey, Patrick
    Feo, John
    Wong, Pak Chung
    Chen, Yousu
    [J]. HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011 (HPC 2011) - 2011 SPRING SIMULATION MULTICONFERENCE - BK 6 OF 8, 2011, 43 (02): : 33 - 39
  • [49] PTAH - INTRODUCTION TO A NEW PARALLEL ARCHITECTURE FOR HIGHLY NUMERIC PROCESSING
    CAPPELLO, F
    BECHENNEC, JL
    GIAVITTO, JL
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1992, 605 : 81 - 96
  • [50] HIGHLY PARALLEL EXECUTION OF PRODUCTION SYSTEMS - A MODEL, ALGORITHMS AND ARCHITECTURE
    OFLAZER, K
    [J]. NEW GENERATION COMPUTING, 1992, 10 (03) : 287 - 313