Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware

被引：2

作者：

Ma, Zixuan ^{[1
]}

Wang, Haojie ^{[1
]}

Feng, Guanyu ^{[1
]}

Zhang, Chen ^{[1
]}

Xie, Lei ^{[1
]}

He, Jiaao ^{[1
]}

Chen, Shengqi ^{[1
]}

Zhai, Jidong ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022 | 2022年

基金：

国家重点研发计划; 中国国家自然科学基金; 北京市自然科学基金;

关键词：

Domain Specific Accelerator; Emulation; Tensor Core;

D O I：

10.1145/3524059.3532377

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Domain-Specific Accelerators (DSAs) are being rapidly developed to support high-performance domain-specific computation. Although DSAs provide massive computation capability, they often only support limited native data types. To mitigate this problem, previous works have explored software emulation for certain data types, which provides some compensation for hardware limitations. However, how to efficiently design more emulated data types and choose a high-performance one without hurting correctness or precision for a given application still remains an open problem. To address these challenges, we present Ape, which can 1) provide different strategies for emulating high-bitwidth data types using native data types with in-depth error analysis; 2) dynamically and automatically select proper data types and generate efficient code for a given computation in fine-granularity to achieve higher performance while maintaining both correctness and precision at the same time without human efforts. We implement Ape on both NVIDIA Tensor Core and Huawei Ascend. Results show that Ape can boost General Matrix Multiplication and convolution by up to 3.12x and 1.86x on Tensor Core over CUDA Core and accelerate various applications by up to 1.78x (1.65x on average).

引用

页数：12

共 12 条

[1] Towards Effective Low-bitwidth Convolutional Neural Networks
Zhuang, Bohan
Shen, Chunhua
Tan, Mingkui
Liu, Lingqiao
Reid, Ian
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7920 - 7928
[2] Federated Optimal Framework with Low-bitwidth Quantization for Distribution System
Feng, Ping
Ning, Jiahong
Yang, Tingting
Kang, Jiabao
Wang, Jiale
Li, Yicheng
[J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 2039 - 2044
[3] Low-Bitwidth Convolutional Neural Networks for Wireless Interference Identification
Wang, Pengyu
Cheng, Yufan
Peng, Qihang
Dong, Binhong
Li, Shaoqian
[J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 557 - 569
[4] Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks
Bernhard, Remi
Moellic, Pierre-Alain
Dutertre, Jean-Max
[J]. 2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2019, : 308 - 315
[5] FedQNN: A Computation-Communication-Efficient Federated Learning Framework for IoT With Low-Bitwidth Neural Network Quantization
Ji, Yu
Chen, Lan
[J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (03): : 2494 - 2507
[6] Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations
Zhuang, Bohan
Tan, Mingkui
Liu, Jing
Liu, Lingqiao
Reid, Ian
Shen, Chunhua
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6140 - 6152
[7] A Low-Bitwidth Integer-STBP Algorithm for Efficient Training and Inference of Spiking Neural Networks
Tan, Pai-Yu
Wu, Cheng-Wen
[J]. 2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 651 - 656
[8] Value-Aware Error Detection and Correction for SRAM Buffers in Low-Bitwidth, Floating-Point CNN Accelerators
Wu, Jun-Shen
Wang, Chi-En
Liu, Ren-Shuo
[J]. 2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 242 - 247
[9] Towards high performance low bitwidth training for deep neural networks
Chunyou Su
Sheng Zhou
Liang Feng
Wei Zhang
[J]. Journal of Semiconductors, 2020, (02) : 65 - 74
[10] Towards high performance low bitwidth training for deep neural networks
Chunyou Su
Sheng Zhou
Liang Feng
Wei Zhang
[J]. Journal of Semiconductors., 2020, 41 (02) - 74

← 1 2 →