Neural Acceleration for GPU Throughput Processors

被引：56

作者：

Yazdanbakhsh, Amir ^{[1
]}

Park, Jongse ^{[1
]}

Sharma, Hardik ^{[1
]}

Lotfi-Kamran, Pejman ^{[2
]}

Esmaeilzadeh, Hadi ^{[1
]}

机构：

[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran

来源：

PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年

基金：

美国国家科学基金会;

关键词：

Approximate computing; GPU; neural processing unit;

D O I：

10.1145/2830772.2830810

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.

引用

页码：482 / 493

页数：12

共 50 条

[21] Acceleration of a CFD code with a GPU
Jespersen, Dennis C.
SCIENTIFIC PROGRAMMING, 2010, 18 (3-4) : 193 - 201
[22] LHCb GPU acceleration project
Badalov, A.
Campora, D.
Neufeld, N.
Vilasis-Cardona, X.
JOURNAL OF INSTRUMENTATION, 2016, 11
[23] GPU acceleration for hermitian eigensystems
1600, Springer Verlag (7776 LNCS):
[24] ArrayFire: a GPU acceleration platform
Malcolm, James
Yalamanchili, Pavan
McClanahan, Chris
Venugopalakrishnan, Vishwanath
Patel, Krunal
Melonakos, John
MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VII, 2012, 8403
[25] A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection
Li, Shijie
Dou, Yong
Niu, Xin
Lv, Qi
Wang, Qiang
NEUROCOMPUTING, 2017, 230 : 48 - 59
[26] On Latency in GPU Throughput Microarchitectures
Andersch, Michael
Lucas, Jan
Alvarez-Mesa, Mauricio
Juurlink, Ben
2015 IEEE International Symposium on Performance Analysis and Software (ISPASS), 2015, : 169 - 170
[27] A Predictive Shutdown Technique for GPU Shader Processors
Wang, Po-Han
Chen, Yen-Ming
Yang, Chia-Lin
Cheng, Yu-Jung
IEEE COMPUTER ARCHITECTURE LETTERS, 2009, 8 (01) : 9 - 12
[28] Clustering Throughput Optimization on the GPU
Gowanlock, Michael
Rude, Cody M.
Blair, David M.
Li, Justin D.
Pankratius, Victor
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 832 - 841
[29] Architecting Waferscale Processors - A GPU Case Study
Pal, Saptadeep
Petrisko, Daniel
Tomei, Matthew
Gupta, Puneet
Iyer, Subramanian S.
Kumar, Rakesh
2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 250 - 263
[30] Trends of Modern Processors for AI Acceleration
Lee, Kyuho
18TH INTERNATIONAL SOC DESIGN CONFERENCE 2021 (ISOCC 2021), 2021, : 227 - 227

← 1 2 3 4 5 →