An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

被引:9
|
作者
Xia, Tian [1 ]
Zhao, Boran [1 ]
Ma, Jian [1 ]
Fu, Gelin [1 ]
Zhao, Wenzhe [1 ]
Zheng, Nanning [1 ]
Ren, Pengju [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Hardware; Integrated circuit modeling; Computational modeling; Shape; Training; Degradation; CNN accelerator; power-of-two quantization; circuit design; ASIC implementation; hardware system; HARDWARE;
D O I
10.1109/TCSI.2022.3227608
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher (8x-65x) area efficiency and (2x-19x) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%-6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.
引用
收藏
页码:1242 / 1255
页数:14
相关论文
共 50 条
  • [21] SCENIC: An Area and Energy-Efficient CNN-based Hardware Accelerator for Discernable Classification of Brain Pathologies using MRI
    Naidu, Bodepu Sai Tirumala
    Biswas, Shreya
    Chatterjee, Rounak
    Mandal, Sayak
    Pratihar, Srijan
    Chatterjee, Ayan
    Raha, Arnab
    Mukherjee, Amitava
    Paluh, Janet
    2022 35TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID 2022) HELD CONCURRENTLY WITH 2022 21ST INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (ES 2022), 2022, : 168 - 173
  • [22] QeiHaN: An Energy-Efficient DNN Accelerator that Leverages Log Quantization in NDP Architectures
    Khabbazan, Bahareh
    Riera, Marc
    Gonzalez, Antonio
    2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 325 - 326
  • [23] AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity
    Lin, Xiaolong
    Li, Gang
    Liu, Zizhao
    Liu, Yadong
    Zhang, Fan
    Song, Zhuoran
    Jing, Naifeng
    Liang, Xiaoyao
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [24] The Case for Hybrid Photonic Plasmonic Interconnects (HyPPIs): Low-Latency Energy-and-Area-Efficient On-Chip Interconnects
    Sun, Shuai
    Badawy, Abdel-Hameed A.
    Narayana, V.
    El-Ghazawi, Tarek
    Sorger, Volker J.
    IEEE PHOTONICS JOURNAL, 2015, 7 (06):
  • [25] An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization
    Dong Wen
    Jingfei Jiang
    Yong Dou
    Jinwei Xu
    Tao Xiao
    CCF Transactions on High Performance Computing, 2021, 3 : 4 - 16
  • [26] An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization
    Wen, Dong
    Jiang, Jingfei
    Dou, Yong
    Xu, Jinwei
    Xiao, Tao
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (01) : 4 - 16
  • [27] Area-efficient Binary and Ternary CNN Accelerator using Random-forest-based Approximation
    Kimura, Kaisei
    Yatabe, Sho
    Isobe, Sora
    Tomioka, Yoichi
    Saito, Hiroshi
    Kohira, Yukihide
    Zhao, Qiangfu
    2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 114 - 120
  • [28] LoCoExNet: Low-Cost Early Exit Network for Energy Efficient CNN Accelerator Design
    Jo, Joongho
    Kim, Geonho
    Kim, Seungtae
    Park, Jongsun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4909 - 4921
  • [29] Energy-Efficient CNN Accelerator Using Voltage-Gated DSHE-MRAM
    Verma, Gaurav
    Soni, Sandeep
    Nisar, Arshid
    Dhull, Seema
    Kaushik, Brajesh Kumar
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2025, 72 (04) : 1715 - 1722
  • [30] Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework
    Nie, Zikai
    Li, Zhisheng
    Wang, Lei
    Guo, Shasha
    Deng, Yu
    Deng, Rangyu
    Dou, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 21 (03) : 418 - 428