An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

被引:9
|
作者
Xia, Tian [1 ]
Zhao, Boran [1 ]
Ma, Jian [1 ]
Fu, Gelin [1 ]
Zhao, Wenzhe [1 ]
Zheng, Nanning [1 ]
Ren, Pengju [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Quantization (signal); Hardware; Integrated circuit modeling; Computational modeling; Shape; Training; Degradation; CNN accelerator; power-of-two quantization; circuit design; ASIC implementation; hardware system; HARDWARE;
D O I
10.1109/TCSI.2022.3227608
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher (8x-65x) area efficiency and (2x-19x) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%-6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.
引用
收藏
页码:1242 / 1255
页数:14
相关论文
共 50 条
  • [31] Design and Analysis of Area and Energy Efficient Reconfigurable Cryptographic Accelerator for Securing IoT Devices
    Zhang, Xvpeng
    Liu, Bingqiang
    Zhao, Yaqi
    Hu, Xiaoyu
    Shen, Zixuan
    Zheng, Zhaoxia
    Liu, Zhenglin
    Chong, Kwen-Siong
    Yu, Guoyi
    Wang, Chao
    Zou, Xuecheng
    SENSORS, 2022, 22 (23)
  • [32] A Family of Modular Area- and Energy-Efficient QRD-Accelerator Architectures
    Vishnoi, Upasna
    Noll, Tobias G.
    INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2013,
  • [33] High Area/Energy Efficiency RRAM CNN Accelerator with Pattern-Pruning-Based Weight Mapping Scheme
    Yu, Songming
    Zhang, Lu
    Wang, Jingyu
    Yue, Jinshan
    Yuan, Zhuqing
    Li, Xueqing
    Yang, Huazhong
    Liu, Yongpan
    10TH IEEE NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM (NVMSA 2021), 2021,
  • [34] An Energy-Efficient and Area-Efficient Depthwise Separable Convolution Accelerator with Minimal On-Chip Memory Access
    Chen, Yi
    Lou, Jie
    Lanius, Christian
    Freye, Florian
    Loh, Johnson
    Gemmeke, Tobias
    2023 IFIP/IEEE 31ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC, 2023, : 50 - 55
  • [35] A Reconfigurable CNN-Based Accelerator Design for Fast and Energy-Efficient Object Detection System on Mobile FPGA
    Kim, Victoria Heekyung
    Choi, Kyuwon Ken
    IEEE ACCESS, 2023, 11 : 59438 - 59445
  • [36] Energy-efficient Scheduling Method with Cross-loop Model for Resource-limited CNN Accelerator Designs
    Yang, Kaiyi
    Wang, Shihao
    Zhou, Jianbin
    Yoshimura, Takeshi
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017, : 2046 - 2049
  • [37] An Energy-Efficient CNN Accelerator for Multi-object Real-Time Semantic Segmentation in Autonomous Vehicle
    Jung, Jueun
    Kim, Seungbin
    Jang, Wuyoung
    Jeong, Hoichang
    Lee, Kyuho
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 36 - 40
  • [38] BitBlade: Area and Energy-Efficient Precision-Scalable Neural Network Accelerator with Bitwise Summation
    Ryu, Sungju
    Kim, Hyungjun
    Yi, Wooseok
    Kim, Jae-Joon
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [39] Energy-efficient Area Coverage by Sensors with Two Adjustable Ranges
    Zalyubovskiy, Vyacheslav
    Erzin, Adil
    Astrakov, Sergey
    Choo, Hyunseung
    2009 7TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS, 2009, : 305 - +
  • [40] SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2332 - 2345