An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

被引:9
|
作者
Xia, Tian [1 ]
Zhao, Boran [1 ]
Ma, Jian [1 ]
Fu, Gelin [1 ]
Zhao, Wenzhe [1 ]
Zheng, Nanning [1 ]
Ren, Pengju [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Hardware; Integrated circuit modeling; Computational modeling; Shape; Training; Degradation; CNN accelerator; power-of-two quantization; circuit design; ASIC implementation; hardware system; HARDWARE;
D O I
10.1109/TCSI.2022.3227608
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher (8x-65x) area efficiency and (2x-19x) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%-6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.
引用
收藏
页码:1242 / 1255
页数:14
相关论文
共 50 条
  • [41] An Energy-Efficient, Unified CNN Accelerator for Real-Time Multi-Object Semantic Segmentation for Autonomous Vehicle
    Jung, Jueun
    Kim, Seungbin
    Jang, Wuyoung
    Seo, Bokyoung
    Lee, Kyuho Jason
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (05) : 2093 - 2104
  • [42] Area and Energy Efficient 2D Max-Pooling For Convolutional Neural Network Hardware Accelerator
    Zhao, Bin
    Chong, Yi Sheng
    Anh Tuan Do
    IECON 2020: THE 46TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2020, : 423 - 427
  • [43] An Energy-Efficient Mixed-Bit CNN Accelerator With Column Parallel Readout for ReRAM-Based In-Memory Computing
    Liu, Dingbang
    Zhou, Haoxiang
    Mao, Wei
    Liu, Jun
    Han, Yuliang
    Man, Changhai
    Wu, Qiuping
    Guo, Zhiru
    Huang, Mingqiang
    Luo, Shaobo
    Lv, Mingsong
    Chen, Quan
    Yu, Hao
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (04) : 821 - 834
  • [44] An Energy-Efficient Mixed-Bit ReRAM-based Computing-in-Memory CNN Accelerator with Fully Parallel Readout
    Liu, Dingbang
    Mao, Wei
    Zhou, Haoxiang
    Liu, Jun
    Wu, Qiuping
    Hong, Haigiao
    Yu, Hao
    2022 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, APCCAS, 2022, : 515 - 519
  • [45] Sagitta: An Energy-Efficient Sparse 3D-CNN Accelerator for Real-Time 3-D Understanding
    Zhou, Changchun
    Liu, Min
    Qiu, Siyuan
    Cao, Xugang
    Fu, Yuzhe
    He, Yifan
    Jiao, Hailong
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (23): : 20703 - 20717
  • [46] Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA
    Li, Shuai
    Sun, Kuangyuan
    Luo, Yukui
    Yadav, Nandakishor
    Choi, Ken
    ELECTRONICS, 2020, 9 (05)
  • [47] An Energy-Efficient BNN Accelerator With Two-Stage Value Prediction for Sparse-Edge Gesture Recognition
    Zhang, Yongliang
    Rong, Yitong
    Duan, Xuyang
    Yang, Zhen
    Li, Qiang
    Guo, Ziyu
    Cheng, Xu
    Zeng, Xiaoyang
    Han, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (01) : 320 - 333
  • [48] BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator
    Chang, Cheng-Yang
    Huang, Chi-Tse
    Chuang, Yu-Chuan
    Chou, Kuang-Chao
    Wu, An-Yeu
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 545 - 550
  • [49] d Energy-efficient two-hop extension protocol for wireless body area networks
    Lin, Chih-Shin
    Chuang, Po-Jen
    IET WIRELESS SENSOR SYSTEMS, 2013, 3 (01) : 37 - 56
  • [50] Energy-efficient Two-Hop Transmission Prioritization Scheme for Wireless Body Area Networks
    Cabacas, Regin
    Yang, Hyunho
    Ra, In-Ho
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 1213 - 1218