Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware

被引:2
|
作者
Ma, Zixuan [1 ]
Wang, Haojie [1 ]
Feng, Guanyu [1 ]
Zhang, Chen [1 ]
Xie, Lei [1 ]
He, Jiaao [1 ]
Chen, Shengqi [1 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Domain Specific Accelerator; Emulation; Tensor Core;
D O I
10.1145/3524059.3532377
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Domain-Specific Accelerators (DSAs) are being rapidly developed to support high-performance domain-specific computation. Although DSAs provide massive computation capability, they often only support limited native data types. To mitigate this problem, previous works have explored software emulation for certain data types, which provides some compensation for hardware limitations. However, how to efficiently design more emulated data types and choose a high-performance one without hurting correctness or precision for a given application still remains an open problem. To address these challenges, we present Ape, which can 1) provide different strategies for emulating high-bitwidth data types using native data types with in-depth error analysis; 2) dynamically and automatically select proper data types and generate efficient code for a given computation in fine-granularity to achieve higher performance while maintaining both correctness and precision at the same time without human efforts. We implement Ape on both NVIDIA Tensor Core and Huawei Ascend. Results show that Ape can boost General Matrix Multiplication and convolution by up to 3.12x and 1.86x on Tensor Core over CUDA Core and accelerate various applications by up to 1.78x (1.65x on average).
引用
收藏
页数:12
相关论文
共 12 条
  • [1] Towards Effective Low-bitwidth Convolutional Neural Networks
    Zhuang, Bohan
    Shen, Chunhua
    Tan, Mingkui
    Liu, Lingqiao
    Reid, Ian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7920 - 7928
  • [2] Federated Optimal Framework with Low-bitwidth Quantization for Distribution System
    Feng, Ping
    Ning, Jiahong
    Yang, Tingting
    Kang, Jiabao
    Wang, Jiale
    Li, Yicheng
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 2039 - 2044
  • [3] Low-Bitwidth Convolutional Neural Networks for Wireless Interference Identification
    Wang, Pengyu
    Cheng, Yufan
    Peng, Qihang
    Dong, Binhong
    Li, Shaoqian
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 557 - 569
  • [4] Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks
    Bernhard, Remi
    Moellic, Pierre-Alain
    Dutertre, Jean-Max
    [J]. 2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2019, : 308 - 315
  • [5] FedQNN: A Computation-Communication-Efficient Federated Learning Framework for IoT With Low-Bitwidth Neural Network Quantization
    Ji, Yu
    Chen, Lan
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (03): : 2494 - 2507
  • [6] Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations
    Zhuang, Bohan
    Tan, Mingkui
    Liu, Jing
    Liu, Lingqiao
    Reid, Ian
    Shen, Chunhua
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6140 - 6152
  • [7] A Low-Bitwidth Integer-STBP Algorithm for Efficient Training and Inference of Spiking Neural Networks
    Tan, Pai-Yu
    Wu, Cheng-Wen
    [J]. 2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 651 - 656
  • [8] Value-Aware Error Detection and Correction for SRAM Buffers in Low-Bitwidth, Floating-Point CNN Accelerators
    Wu, Jun-Shen
    Wang, Chi-En
    Liu, Ren-Shuo
    [J]. 2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 242 - 247
  • [9] Towards high performance low bitwidth training for deep neural networks
    Chunyou Su
    Sheng Zhou
    Liang Feng
    Wei Zhang
    [J]. Journal of Semiconductors, 2020, (02) : 65 - 74
  • [10] Towards high performance low bitwidth training for deep neural networks
    Chunyou Su
    Sheng Zhou
    Liang Feng
    Wei Zhang
    [J]. Journal of Semiconductors., 2020, 41 (02) - 74