A Fast Algorithm-Based Cost-Effective and Hardware-Efficient Unified Architecture Design of 4 x 4, 8 x 8, 16 x 16, and 32 x 32 Inverse Core Transforms for HEVC

被引:7
|
作者
Chang, Chia-Wei [1 ]
Hsu, Hao-Fan [1 ]
Fan, Chih-Peng [1 ]
Wu, Chung-Bin [1 ]
Chang, Robert Chen-Hao [1 ,2 ]
机构
[1] Natl Chung Hsing Univ, Dept Elect Engn, Taichung 402, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou 545, Taiwan
关键词
Hardware sharing; Hardware efficiency; Fast transform; High-efficiency video coding (HEVC); Video decoding;
D O I
10.1007/s11265-015-0982-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, a novel fast algorithm based hardware-sharing architecture for 4 x 4, 8 x 8, 16 x 16, and 32 x 32 inverse core transforms in high-efficiency video coding (HEVC) with a cost effective and highly hardware efficient design is developed. By using the symmetrical characteristics of the elements in inverse core transform matrices, the core transform matrix with symmetrical characteristics is factorized into several submatrices. Based on the symmetry and similarity between the submatrices, the hardware of the (N/2) x (N/2) inverse core transform is shared with that of the N x N inverse core transform for N = 32, 16, and 8. Compared with each transform design without hardware shares, the proposed multiplierless transform architecture reduces the hardware overheads of adders and shifters by 32 and 36 %, respectively. The hardware efficiency of the proposed architecture is up to 166 % higher than that of several previous transform designs for HEVC, and up to 141 % higher than that of field-programmable gate array (FPGA)-based 16-point transform designs. Because it uses 90-nm complimentary metal-oxide semiconductor (CMOS) technology produced by the Taiwan Semiconductor Manufacturing Company (TSMC), the proposed 1-D hardware sharing scheme requires 115.7 K gate counts to achieve an operational frequency of up to 200 MHz, and it can decode 4 x 2 K (4096 x 2048 pixels) and 8 K UHDTV (7680 x 4320 pixels) video in real time at up to 127 and 32 frames per second, respectively.
引用
收藏
页码:69 / 89
页数:21
相关论文
共 26 条
  • [1] A Fast Algorithm-Based Cost-Effective and Hardware-Efficient Unified Architecture Design of 4 × 4, 8 × 8, 16 × 16, and 32 × 32 Inverse Core Transforms for HEVC
    Chia-Wei Chang
    Hao-Fan Hsu
    Chih-Peng Fan
    Chung-Bin Wu
    Robert Chen-Hao Chang
    [J]. Journal of Signal Processing Systems, 2016, 82 : 69 - 89
  • [2] An optimized hardware architecture of 4x4, 8x8, 16x16 and 32x32 inverse transform for HEVC
    Kammoun, Manel
    Maamouri, Emna
    Ben Atitallah, Ahmed
    Masmoudi, Nouri
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 264 - 267
  • [3] Cost-effective hardware sharing architectures of fast 8x8 and 4x4 integer transforms for H.264/AVC
    Fan, Chih-Peng
    [J]. 2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006, : 776 - 779
  • [4] High-efficiency Multiple 4x4 and 8x8 Inverse Transform Design With a Cost-effective Unified Architecture for Multistandard Video Decoders
    Chang, Chia-Wei
    Hsu, Hao-Fan
    Fan, Chih-Peng
    [J]. 2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2014, : 507 - 510
  • [5] Low-Complexity Integrated Architecture of 4x4, 4x8, 8x4 and 8x8 Inverse Integer Transforms of VC-1
    Wang, Yi-Jung
    Chang, Chih Chi
    Wu, Guo Zua
    Chen, Oscal T. -C.
    [J]. 2009 52ND IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1 AND 2, 2009, : 543 - +
  • [6] Implementations of low-cost hardware sharing architectures for fast 8 x 8 and 4 x 4 integer transforms in H.264/AVC
    Fan, Chih-Peng
    Lin, Yu-Lian
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2007, E90A (02) : 511 - 516
  • [7] AN AREA-EFFICIENT 4/8/16/32-POINT INVERSE DCT ARCHITECTURE FOR UHDTV HEVC DECODER
    Sun, Heming
    Zhou, Dajiang
    Zhu, Jiayi
    Kimura, Shinji
    Goto, Satoshi
    [J]. 2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE, 2014, : 197 - 200
  • [8] High-efficiency and Cost-sharing Architecture Design of Fast Algorithm Based Multiple 4x4 and 8x8 Forward Transforms For Multi-standard Video Encoder
    Hsu, Hao-Fan
    Chang, Chia-Wei
    Fan, Chih-Peng
    [J]. 2016 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2016, : 184 - 187
  • [9] A cost-effective 8x8 2-D IDCT core processor with folded architecture
    Chen, TH
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1999, 45 (02) : 333 - 339
  • [10] Cost Effective Hardware Sharing Architecture for Fast 1-D 8x8 Forward and Inverse Integer Transforms of H.264/AVC High Profile
    Su, Guo-An
    Fan, Chih-Peng
    [J]. 2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4, 2008, : 1332 - 1335