Mix-GEMM: Extending RISC-V CPUs for Energy-Efficient Mixed-Precision DNN Inference Using Binary Segmentation

被引:0
|
作者
Fornt, Jordi [1 ,2 ]
Reggiani, Enrico [1 ]
Fontova-Musté, Pau [1 ]
Rodas, Narcís [1 ]
Pappalardo, Alessandro [3 ]
Sabri Unsal, Osman [1 ]
Kestelman, Adrián Cristal [1 ,2 ]
Altet, Josep [3 ]
Moll, Francesc [1 ,2 ]
Abella, Jaume [1 ,2 ]
机构
[1] Barcelona Supercomputing Center (BSC), Barcelona,08034, Spain
[2] The Universitat Politècnica de Catalunya (UPC), Barcelona,08034, Spain
[3] The Advanced Micro-Devices (AMD), Dublin,D24 T683, Ireland
关键词
Digital storage - Memory architecture - Program processors;
D O I
10.1109/TC.2024.3500369
中图分类号
学科分类号
摘要
Efficiently computing Deep Neural Networks (DNNs) has become a primary challenge in today’s computers, especially on devices targeting mobile or edge applications. Recent progress on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) has shown that the key to high energy efficiency lies in executing deep learning models with low- (8- to 5-bit) or ultra-low-precision (4- to 2-bit). Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) present severe limitations on the range of data sizes supported to compute DNN kernels. In this work, we present Mix-GEMM, a hardware-software co-designed architecture that enables RISC-V processors to efficiently compute arbitrary mixed-precision DNN kernels, supporting all data size combinations from 8- to 2-bit. By applying binary segmentation, our architecture can scale its throughput by decreasing the data size of the operands, resulting in a flexible approach capable of leveraging state-of-the-art QAT and PTQ to achieve high energy efficiency at a very low cost. Evaluating our Mix-GEMM architecture in a dual-issue in-order RISC-V processor shows that we are able to boost its performance and energy efficiency by up to 44× and 11× with respect to the baseline processor, with an area overhead of only 2%. This allows our extended processor to execute state-of-the-art DNNs with significantly higher performance and energy efficiency than the standard FP32 precision, while retaining almost the same model accuracy. © 1968-2012 IEEE.
引用
收藏
页码:582 / 596
相关论文
共 7 条
  • [1] A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference
    Ottavi, Gianmarco
    Garofalo, Angelo
    Tagliavini, Giuseppe
    Conti, Francesco
    Benini, Luca
    Rossi, Davide
    2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 512 - 517
  • [2] Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices
    Reggiani, Enrico
    Pappalardo, Alessandro
    Doblas, Max
    Moreto, Miquel
    Olivieri, Mauro
    Unsal, Osman Sabri
    Cristal, Adrian
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1085 - 1098
  • [3] A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference
    Wang, Chuanning
    Fang, Chao
    Wu, Xiao
    Wang, Zhongfeng
    Lin, Jun
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [4] EXTREM-EDGE-EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT
    Verma, Vaibhav
    Tracy II, Tommy
    Stan, Mircea R.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2022, 35
  • [5] A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks
    Nadalini, Alessandro
    Rutishauser, Georg
    Burrello, Alessio
    Bruschi, Nazareno
    Garofalo, Angelo
    Benini, Luca
    Conti, Francesco
    Rossi, Davide
    2023 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2023, : 145 - 150
  • [6] Energy-Efficient Implementation of YOLOv8, Instance Segmentation, and Pose Detection on RISC-V SoC
    Wang, Hansen
    Li, Dongju
    Isshiki, Tsuyoshi
    IEEE ACCESS, 2024, 12 : 64050 - 64068
  • [7] A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU
    Zhang, Li
    Lv, Qishen
    Gao, Di
    Zhou, Xian
    Meng, Wenchao
    Yang, Qinmin
    Zhuo, Cheng
    INTEGRATION-THE VLSI JOURNAL, 2023, 88 : 241 - 248