An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

被引：9

作者：

Xia, Tian ^{[1
]}

Zhao, Boran ^{[1
]}

Ma, Jian ^{[1
]}

Fu, Gelin ^{[1
]}

Zhao, Wenzhe ^{[1
]}

Zheng, Nanning ^{[1
]}

Ren, Pengju ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2023年 / 70卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Hardware; Integrated circuit modeling; Computational modeling; Shape; Training; Degradation; CNN accelerator; power-of-two quantization; circuit design; ASIC implementation; hardware system; HARDWARE;

D O I：

10.1109/TCSI.2022.3227608

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher (8x-65x) area efficiency and (2x-19x) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%-6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.

引用

页码：1242 / 1255

页数：14

共 50 条

[21] SCENIC: An Area and Energy-Efficient CNN-based Hardware Accelerator for Discernable Classification of Brain Pathologies using MRI
Naidu, Bodepu Sai Tirumala
Biswas, Shreya
Chatterjee, Rounak
Mandal, Sayak
Pratihar, Srijan
Chatterjee, Ayan
Raha, Arnab
Mukherjee, Amitava
Paluh, Janet
2022 35TH INTERNATIONAL CONFERENCE ON VLSI DESIGN (VLSID 2022) HELD CONCURRENTLY WITH 2022 21ST INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (ES 2022), 2022, : 168 - 173
[22] QeiHaN: An Energy-Efficient DNN Accelerator that Leverages Log Quantization in NDP Architectures
Khabbazan, Bahareh
Riera, Marc
Gonzalez, Antonio
2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 325 - 326
[23] AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity
Lin, Xiaolong
Li, Gang
Liu, Zizhao
Liu, Yadong
Zhang, Fan
Song, Zhuoran
Jing, Naifeng
Liang, Xiaoyao
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[24] The Case for Hybrid Photonic Plasmonic Interconnects (HyPPIs): Low-Latency Energy-and-Area-Efficient On-Chip Interconnects
Sun, Shuai
Badawy, Abdel-Hameed A.
Narayana, V.
El-Ghazawi, Tarek
Sorger, Volker J.
IEEE PHOTONICS JOURNAL, 2015, 7 (06):
[25] An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization
Dong Wen
Jingfei Jiang
Yong Dou
Jinwei Xu
Tao Xiao
CCF Transactions on High Performance Computing, 2021, 3 : 4 - 16
[26] An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization
Wen, Dong
Jiang, Jingfei
Dou, Yong
Xu, Jinwei
Xiao, Tao
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (01) : 4 - 16
[27] Area-efficient Binary and Ternary CNN Accelerator using Random-forest-based Approximation
Kimura, Kaisei
Yatabe, Sho
Isobe, Sora
Tomioka, Yoichi
Saito, Hiroshi
Kohira, Yukihide
Zhao, Qiangfu
2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 114 - 120
[28] LoCoExNet: Low-Cost Early Exit Network for Energy Efficient CNN Accelerator Design
Jo, Joongho
Kim, Geonho
Kim, Seungtae
Park, Jongsun
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4909 - 4921
[29] Energy-Efficient CNN Accelerator Using Voltage-Gated DSHE-MRAM
Verma, Gaurav
Soni, Sandeep
Nisar, Arshid
Dhull, Seema
Kaushik, Brajesh Kumar
IEEE TRANSACTIONS ON ELECTRON DEVICES, 2025, 72 (04) : 1715 - 1722
[30] Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework
Nie, Zikai
Li, Zhisheng
Wang, Lei
Guo, Shasha
Deng, Yu
Deng, Rangyu
Dou, Qiang
INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 21 (03) : 418 - 428

← 1 2 3 4 5 →