Mix-GEMM: Extending RISC-V CPUs for Energy-Efficient Mixed-Precision DNN Inference Using Binary Segmentation

被引：0

作者：

Fornt, Jordi ^{[1
,2
]}

Reggiani, Enrico ^{[1
]}

Fontova-Musté, Pau ^{[1
]}

Rodas, Narcís ^{[1
]}

Pappalardo, Alessandro ^{[3
]}

Sabri Unsal, Osman ^{[1
]}

Kestelman, Adrián Cristal ^{[1
,2
]}

Altet, Josep ^{[3
]}

Moll, Francesc ^{[1
,2
]}

Abella, Jaume ^{[1
,2
]}

机构：

[1] Barcelona Supercomputing Center (BSC), Barcelona,08034, Spain

[2] The Universitat Politècnica de Catalunya (UPC), Barcelona,08034, Spain

[3] The Advanced Micro-Devices (AMD), Dublin,D24 T683, Ireland

来源：

IEEE Transactions on Computers | 2025年 / 74卷 / 02期

关键词：

Digital storage - Memory architecture - Program processors;

D O I：

10.1109/TC.2024.3500369

中图分类号：

学科分类号：

摘要：

Efficiently computing Deep Neural Networks (DNNs) has become a primary challenge in today’s computers, especially on devices targeting mobile or edge applications. Recent progress on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) has shown that the key to high energy efficiency lies in executing deep learning models with low- (8- to 5-bit) or ultra-low-precision (4- to 2-bit). Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) present severe limitations on the range of data sizes supported to compute DNN kernels. In this work, we present Mix-GEMM, a hardware-software co-designed architecture that enables RISC-V processors to efficiently compute arbitrary mixed-precision DNN kernels, supporting all data size combinations from 8- to 2-bit. By applying binary segmentation, our architecture can scale its throughput by decreasing the data size of the operands, resulting in a flexible approach capable of leveraging state-of-the-art QAT and PTQ to achieve high energy efficiency at a very low cost. Evaluating our Mix-GEMM architecture in a dual-issue in-order RISC-V processor shows that we are able to boost its performance and energy efficiency by up to 44× and 11× with respect to the baseline processor, with an area overhead of only 2%. This allows our extended processor to execute state-of-the-art DNNs with significantly higher performance and energy efficiency than the standard FP32 precision, while retaining almost the same model accuracy. © 1968-2012 IEEE.

引用

页码：582 / 596

共 7 条

[1] A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference
Ottavi, Gianmarco
Garofalo, Angelo
Tagliavini, Giuseppe
Conti, Francesco
Benini, Luca
Rossi, Davide
2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 512 - 517
[2] Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices
Reggiani, Enrico
Pappalardo, Alessandro
Doblas, Max
Moreto, Miquel
Olivieri, Mauro
Unsal, Osman Sabri
Cristal, Adrian
2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1085 - 1098
[3] A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
Wang, Chuanning
Fang, Chao
Wu, Xiao
Wang, Zhongfeng
Lin, Jun
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[4] EXTREM-EDGE-EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT
Verma, Vaibhav
Tracy II, Tommy
Stan, Mircea R.
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2022, 35
[5] A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks
Nadalini, Alessandro
Rutishauser, Georg
Burrello, Alessio
Bruschi, Nazareno
Garofalo, Angelo
Benini, Luca
Conti, Francesco
Rossi, Davide
2023 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2023, : 145 - 150
[6] Energy-Efficient Implementation of YOLOv8, Instance Segmentation, and Pose Detection on RISC-V SoC
Wang, Hansen
Li, Dongju
Isshiki, Tsuyoshi
IEEE ACCESS, 2024, 12 : 64050 - 64068
[7] A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU
Zhang, Li
Lv, Qishen
Gao, Di
Zhou, Xian
Meng, Wenchao
Yang, Qinmin
Zhuo, Cheng
INTEGRATION-THE VLSI JOURNAL, 2023, 88 : 241 - 248

← 1 →