Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

被引：3

作者：

Zhao, Xiaotian ^{[1
]}

Gao, Yimin ^{[2
]}

Verma, Vaibhav ^{[2
]}

Xu, Ruge ^{[1
]}

Stan, Mircea ^{[2
]}

Guo, Xinfei ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Univ Virginia, Charlottesville, VA USA

来源：

PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023 | 2023年

关键词：

Mixed-precision quantization; Scalable architectures; Post-training quantization; Neural networks; Edge AI;

D O I：

10.1145/3583781.3590292

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Layer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference since it strikes a better balance between accuracy and efficiency compared to the uniform quantization scheme. Existing MPQ strategies either lacked hardware awareness or incurred huge computation costs, which gated their deployment at the edge. In this work, we propose a novel MPQ search algorithm that obtains an optimal scheme by "sampling" layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. To further efficiently deploy post-training MPQ on edge chips, we propose to tightly integrate the quantized inference units as part of the processor pipeline through micro-architecture and Instruction Set Architecture (ISA) co-design. Evaluation results show that the proposed search algorithm achieves 3% similar to 11% higher inference accuracy with similar hardware cost compared to the state-of-the-art MPQ strategies. In addition, the tightly integrated MPQ units achieve speedup of 15.13x similar to 29.65x compared to a baseline RISC-V processor.

引用

页码：467 / 471

页数：5

共 6 条

[1] Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
Zhao, Xiaotian
Xu, Ruge
Gao, Yimin
Verma, Vaibhav
Stan, Mircea R.
Guo, Xinfei
[J]. IEEE Transactions on Computers, 2024, 73 (11) : 2504 - 2519
[2] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
Tang, Chen
Ouyang, Kai
Wang, Zhi
Zhu, Yifei
Ji, Wen
Wang, Yaowei
Zhu, Wenwu
[J]. COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275
[3] SQNR-based Layer-wise Mixed-Precision Schemes with Computational Complexity Consideration
Kim, Ha-Na
Eun, Hyun
Choi, Jung Hwan
Kim, Ji-Hoon
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 234 - 235
[4] Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
Risso, Matteo
Burrello, Alessio
Benini, Luca
Macii, Enrico
Poncino, Massimo
Pagliari, Daniele Jahier
[J]. 2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2022, : 33 - 38
[5] Complexity-Aware Layer-Wise Mixed-Precision Schemes With SQNR-Based Fast Analysis
Kim, Hana
Eun, Hyun
Choi, Jung Hwan
Kim, Ji-Hoon
[J]. IEEE ACCESS, 2023, 11 : 117800 - 117809
[6] Design-Space Exploration of Mixed-precision DNN Accelerators based on Sum-Together Multipliers
Urbinati, Luca
Casu, Mario R.
[J]. 2023 18TH CONFERENCE ON PH.D RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIME, 2023, : 377 - 380

← 1 →