Neural Acceleration for GPU Throughput Processors

被引：56

作者：

Yazdanbakhsh, Amir ^{[1
]}

Park, Jongse ^{[1
]}

Sharma, Hardik ^{[1
]}

Lotfi-Kamran, Pejman ^{[2
]}

Esmaeilzadeh, Hadi ^{[1
]}

机构：

[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran

来源：

PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年

基金：

美国国家科学基金会;

关键词：

Approximate computing; GPU; neural processing unit;

D O I：

10.1145/2830772.2830810

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.

引用

页码：482 / 493

页数：12

共 50 条

[1] Acceleration of Neural Network Inference for Embedded GPU Systems
Terakura, Kei
Chang, Qiong
Miyazaki, Jun
2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 361 - 362
[2] Multifold Acceleration of Neural Network Computations Using GPU
Guzhva, Alexander
Dolenko, Sergey
Persiantsev, Igor
ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 373 - 380
[3] VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration
Ginzburg, Samuel
Shahrad, Mohammad
Freedman, Michael J.
PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 1017 - 1037
[4] High throughput acceleration of NIST lightweight authenticated encryption schemes on GPU platform
Chan, Jia-Lin
Lee, Wai-Kong
Wong, Denis C. -K.
Yap, Wun-She
Ooi, Boon-Yaik
Goi, Bok-Min
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 11213 - 11235
[5] High Throughput Neural Network based Embedded Streaming Multicore Processors
Hasan, Raqibul
Taha, Tarek M.
Yakopcic, Chris
Mountain, David J.
2016 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2016,
[6] Collective behavior of large-scale neural networks with GPU acceleration
Qu, Jingyi
Wang, Rubin
COGNITIVE NEURODYNAMICS, 2017, 11 (06) : 553 - 563
[7] Collective behavior of large-scale neural networks with GPU acceleration
Jingyi Qu
Rubin Wang
Cognitive Neurodynamics, 2017, 11 : 553 - 563
[8] Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection
Li, Shijie
Dou, Yong
Lv, Qi
Wang, Qiang
Niu, Xin
Yang, Ke
PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 224 - 230
[9] Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing
Wang, Zhenning
Yang, Jun
Melhem, Rami
Childers, Bruce
Zhang, Youtao
Guo, Minyi
PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 358 - 369
[10] Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors
Han, Bing
Taha, Tarek M.
APPLIED OPTICS, 2010, 49 (10) : B83 - B91

← 1 2 3 4 5 →