Neural Acceleration for GPU Throughput Processors

被引:56
|
作者
Yazdanbakhsh, Amir [1 ]
Park, Jongse [1 ]
Sharma, Hardik [1 ]
Lotfi-Kamran, Pejman [2 ]
Esmaeilzadeh, Hadi [1 ]
机构
[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
基金
美国国家科学基金会;
关键词
Approximate computing; GPU; neural processing unit;
D O I
10.1145/2830772.2830810
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.
引用
收藏
页码:482 / 493
页数:12
相关论文
共 50 条
  • [21] Acceleration of a CFD code with a GPU
    Jespersen, Dennis C.
    SCIENTIFIC PROGRAMMING, 2010, 18 (3-4) : 193 - 201
  • [22] LHCb GPU acceleration project
    Badalov, A.
    Campora, D.
    Neufeld, N.
    Vilasis-Cardona, X.
    JOURNAL OF INSTRUMENTATION, 2016, 11
  • [23] GPU acceleration for hermitian eigensystems
    1600, Springer Verlag (7776 LNCS):
  • [24] ArrayFire: a GPU acceleration platform
    Malcolm, James
    Yalamanchili, Pavan
    McClanahan, Chris
    Venugopalakrishnan, Vishwanath
    Patel, Krunal
    Melonakos, John
    MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VII, 2012, 8403
  • [25] A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection
    Li, Shijie
    Dou, Yong
    Niu, Xin
    Lv, Qi
    Wang, Qiang
    NEUROCOMPUTING, 2017, 230 : 48 - 59
  • [26] On Latency in GPU Throughput Microarchitectures
    Andersch, Michael
    Lucas, Jan
    Alvarez-Mesa, Mauricio
    Juurlink, Ben
    2015 IEEE International Symposium on Performance Analysis and Software (ISPASS), 2015, : 169 - 170
  • [27] A Predictive Shutdown Technique for GPU Shader Processors
    Wang, Po-Han
    Chen, Yen-Ming
    Yang, Chia-Lin
    Cheng, Yu-Jung
    IEEE COMPUTER ARCHITECTURE LETTERS, 2009, 8 (01) : 9 - 12
  • [28] Clustering Throughput Optimization on the GPU
    Gowanlock, Michael
    Rude, Cody M.
    Blair, David M.
    Li, Justin D.
    Pankratius, Victor
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 832 - 841
  • [29] Architecting Waferscale Processors - A GPU Case Study
    Pal, Saptadeep
    Petrisko, Daniel
    Tomei, Matthew
    Gupta, Puneet
    Iyer, Subramanian S.
    Kumar, Rakesh
    2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 250 - 263
  • [30] Trends of Modern Processors for AI Acceleration
    Lee, Kyuho
    18TH INTERNATIONAL SOC DESIGN CONFERENCE 2021 (ISOCC 2021), 2021, : 227 - 227