A Fast Algorithm-Based Cost-Effective and Hardware-Efficient Unified Architecture Design of 4 x 4, 8 x 8, 16 x 16, and 32 x 32 Inverse Core Transforms for HEVC

被引:7
|
作者
Chang, Chia-Wei [1 ]
Hsu, Hao-Fan [1 ]
Fan, Chih-Peng [1 ]
Wu, Chung-Bin [1 ]
Chang, Robert Chen-Hao [1 ,2 ]
机构
[1] Natl Chung Hsing Univ, Dept Elect Engn, Taichung 402, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou 545, Taiwan
关键词
Hardware sharing; Hardware efficiency; Fast transform; High-efficiency video coding (HEVC); Video decoding;
D O I
10.1007/s11265-015-0982-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, a novel fast algorithm based hardware-sharing architecture for 4 x 4, 8 x 8, 16 x 16, and 32 x 32 inverse core transforms in high-efficiency video coding (HEVC) with a cost effective and highly hardware efficient design is developed. By using the symmetrical characteristics of the elements in inverse core transform matrices, the core transform matrix with symmetrical characteristics is factorized into several submatrices. Based on the symmetry and similarity between the submatrices, the hardware of the (N/2) x (N/2) inverse core transform is shared with that of the N x N inverse core transform for N = 32, 16, and 8. Compared with each transform design without hardware shares, the proposed multiplierless transform architecture reduces the hardware overheads of adders and shifters by 32 and 36 %, respectively. The hardware efficiency of the proposed architecture is up to 166 % higher than that of several previous transform designs for HEVC, and up to 141 % higher than that of field-programmable gate array (FPGA)-based 16-point transform designs. Because it uses 90-nm complimentary metal-oxide semiconductor (CMOS) technology produced by the Taiwan Semiconductor Manufacturing Company (TSMC), the proposed 1-D hardware sharing scheme requires 115.7 K gate counts to achieve an operational frequency of up to 200 MHz, and it can decode 4 x 2 K (4096 x 2048 pixels) and 8 K UHDTV (7680 x 4320 pixels) video in real time at up to 127 and 32 frames per second, respectively.
引用
收藏
页码:69 / 89
页数:21
相关论文
共 26 条
  • [21] Data Acquisition System of 16-channel EEG Based on ATSAM3X8E ARM Cortex-M3 32-bit Microcontroller and ADS1299
    Toresano, L. O. H. Z.
    Wijaya, S. K.
    Prawito
    Sudarmaji, A.
    Badri, C.
    [J]. INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2016 (ISCPMS 2016), 2017, 1862
  • [22] Dynamic add/drop of 8-of-16 10-Gb/s channels in 4x40km semiconductor-optical-amplifier-based WDM system
    Gnauck, AH
    Spiekman, LH
    Wiesenfeld, JM
    Garrett, LD
    [J]. OPTICAL AMPLIFIERS AND THEIR APPLICATIONS, PROCEEDINGS, 2002, 77 : 161 - 163
  • [23] X-RAY MOLECULAR-STRUCTURE OF THE ORANGE ISOMER OF (5,7,7,12,14,14-HEXAMETHYL-1,4,8,11-TETRAAZACYCLOTETRADECA-4,11-DIENE)COPPER(II) PERCHLORATE, [CU(C16H32N4)](CLO4)2
    LEE, TJ
    LU, TH
    CHUNG, CS
    LEE, TY
    [J]. ACTA CRYSTALLOGRAPHICA SECTION C-CRYSTAL STRUCTURE COMMUNICATIONS, 1984, 40 (JAN) : 70 - 72
  • [24] 16.2 A 4x Interleaved 10GS/s 8b Time-Domain ADC with 16x Interpolation-Based Inter-Stage Gain Achieving >37.5dB SNDR at 18GHz Input
    Zhang, Minglei
    Zhu, Yan
    Chan, Chi-Hang
    Martins, Rui P.
    [J]. 2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, : 252 - +
  • [25] Molecular structure design based on Lewis acid-base interaction in the preparation of bimetallic alkoxides derived from two electronegative elements.: The synthesis and X-ray single crystal study of Mo2Ta4O8(OMe)16 and Mo4Ta2O8(OiPr)14
    Johansson, A
    Kessler, VG
    [J]. INORGANIC CHEMISTRY COMMUNICATIONS, 2000, 3 (01) : 5 - 7
  • [26] Demonstration of High-Performance Cost-Effective 100-Gb/s TWDM-PON Using 4x 25-Gb/s Optical Duobinary Channels with 16-GHz APD and Receiver-Side Post-Equalization
    Ye, Zhicheng
    Li, Shengping
    Cheng, Ning
    Liu, Xiang
    [J]. ECOC 2015 41ST EUROPEAN CONFERENCE ON OPTICAL COMMUNICATION, 2015,