Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

被引:3
|
作者
Zhao, Xiaotian [1 ]
Gao, Yimin [2 ]
Verma, Vaibhav [2 ]
Xu, Ruge [1 ]
Stan, Mircea [2 ]
Guo, Xinfei [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Univ Virginia, Charlottesville, VA USA
关键词
Mixed-precision quantization; Scalable architectures; Post-training quantization; Neural networks; Edge AI;
D O I
10.1145/3583781.3590292
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Layer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference since it strikes a better balance between accuracy and efficiency compared to the uniform quantization scheme. Existing MPQ strategies either lacked hardware awareness or incurred huge computation costs, which gated their deployment at the edge. In this work, we propose a novel MPQ search algorithm that obtains an optimal scheme by "sampling" layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. To further efficiently deploy post-training MPQ on edge chips, we propose to tightly integrate the quantized inference units as part of the processor pipeline through micro-architecture and Instruction Set Architecture (ISA) co-design. Evaluation results show that the proposed search algorithm achieves 3% similar to 11% higher inference accuracy with similar hardware cost compared to the state-of-the-art MPQ strategies. In addition, the tightly integrated MPQ units achieve speedup of 15.13x similar to 29.65x compared to a baseline RISC-V processor.
引用
收藏
页码:467 / 471
页数:5
相关论文
共 6 条
  • [1] Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
    Zhao, Xiaotian
    Xu, Ruge
    Gao, Yimin
    Verma, Vaibhav
    Stan, Mircea R.
    Guo, Xinfei
    [J]. IEEE Transactions on Computers, 2024, 73 (11) : 2504 - 2519
  • [2] Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance
    Tang, Chen
    Ouyang, Kai
    Wang, Zhi
    Zhu, Yifei
    Ji, Wen
    Wang, Yaowei
    Zhu, Wenwu
    [J]. COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 259 - 275
  • [3] SQNR-based Layer-wise Mixed-Precision Schemes with Computational Complexity Consideration
    Kim, Ha-Na
    Eun, Hyun
    Choi, Jung Hwan
    Kim, Ji-Hoon
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 234 - 235
  • [4] Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes
    Risso, Matteo
    Burrello, Alessio
    Benini, Luca
    Macii, Enrico
    Poncino, Massimo
    Pagliari, Daniele Jahier
    [J]. 2022 IEEE 13TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2022, : 33 - 38
  • [5] Complexity-Aware Layer-Wise Mixed-Precision Schemes With SQNR-Based Fast Analysis
    Kim, Hana
    Eun, Hyun
    Choi, Jung Hwan
    Kim, Ji-Hoon
    [J]. IEEE ACCESS, 2023, 11 : 117800 - 117809
  • [6] Design-Space Exploration of Mixed-precision DNN Accelerators based on Sum-Together Multipliers
    Urbinati, Luca
    Casu, Mario R.
    [J]. 2023 18TH CONFERENCE ON PH.D RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIME, 2023, : 377 - 380