An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

被引:9
|
作者
Xia, Tian [1 ]
Zhao, Boran [1 ]
Ma, Jian [1 ]
Fu, Gelin [1 ]
Zhao, Wenzhe [1 ]
Zheng, Nanning [1 ]
Ren, Pengju [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Hardware; Integrated circuit modeling; Computational modeling; Shape; Training; Degradation; CNN accelerator; power-of-two quantization; circuit design; ASIC implementation; hardware system; HARDWARE;
D O I
10.1109/TCSI.2022.3227608
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher (8x-65x) area efficiency and (2x-19x) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%-6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.
引用
收藏
页码:1242 / 1255
页数:14
相关论文
共 50 条
  • [1] Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks
    Geng, Xinkuang
    Liu, Siting
    Jiang, Jianfei
    Jiang, Kai
    Jiang, Honglan
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [2] Efficient Hardware Implementation of Cellular Neural Networks with Powers-of-Two Based Incremental Quantization
    Xu, Xiaowei
    Lu, Qing
    Wang, Tianchen
    Liu, Jinglan
    Hu, Yu
    Shi, Yiyu
    PROCEEDINGS OF NEUROMORPHIC COMPUTING SYMPOSIUM (NCS 2017), 2017,
  • [3] Reconfigurable and area-efficient architecture for symmetric FIR filters with powers-of-two coefficients
    Lee, Dongwon
    2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 382 - 386
  • [4] An Area Efficient Superconducting Unary CNN Accelerator
    Gonzalez-Guerrero, Patricia
    Huch, Kylie
    Patra, Nirmalendu
    Popovici, Thom
    Michelogiannakis, George
    2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED, 2023, : 675 - 682
  • [5] ROBUST AND COMPUTATIONALLY-EFFICIENT ANOMALY DETECTION USING POWERS-OF-TWO NETWORKS
    Muneeb, Usama
    Koyuncu, Erdem
    Keshtkarjahromi, Yasaman
    Seferoglu, Hulya
    Erdent, Mehmet Fatih
    Cetin, A. Enis
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2992 - 2996
  • [6] ASLog: An Area-Efficient CNN Accelerator for Per-Channel Logarithmic Post-Training Quantization
    Xu, Jiawei
    Fan, Jiangshan
    Nan, Baolin
    Ding, Chen
    Zheng, Li-Rong
    Zou, Zhuo
    Huan, Yuxiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (12) : 5380 - 5393
  • [7] Efficient Encoding of Powers-of-two Coefficients through Minimum Index Floating Point Representation (MIFPR)
    Chandra, Abhijit
    Chattopadhyay, Sudipta
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, ENERGY & COMMUNICATION (CIEC), 2014, : 650 - 653
  • [8] RRAM Based Buffer Design for Energy Efficient CNN Accelerator
    Guo, Kaiyuan
    Yu, Jincheng
    Ning, Xuefei
    Hu, Yiming
    Wang, Yu
    Yang, Huazhong
    2018 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2018, : 435 - 440
  • [9] An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture
    Chen, Weijia
    Wu, Hui
    Wei, Shaojun
    He, Anping
    Chen, Hong
    2018 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC): PROCEEDINGS OF TECHNICAL PAPERS, 2018, : 51 - 54
  • [10] DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
    See, Jin-Chuan
    Ng, Hui-Fuang
    Tan, Hung-Khoon
    Chang, Jing-Jing
    Lee, Wai-Kong
    Hwang, Seong Oun
    IEEE Access, 2021, 9 : 169082 - 169091