Neural Acceleration for GPU Throughput Processors

被引:56
|
作者
Yazdanbakhsh, Amir [1 ]
Park, Jongse [1 ]
Sharma, Hardik [1 ]
Lotfi-Kamran, Pejman [2 ]
Esmaeilzadeh, Hadi [1 ]
机构
[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
基金
美国国家科学基金会;
关键词
Approximate computing; GPU; neural processing unit;
D O I
10.1145/2830772.2830810
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.
引用
收藏
页码:482 / 493
页数:12
相关论文
共 50 条
  • [1] Acceleration of Neural Network Inference for Embedded GPU Systems
    Terakura, Kei
    Chang, Qiong
    Miyazaki, Jun
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 361 - 362
  • [2] Multifold Acceleration of Neural Network Computations Using GPU
    Guzhva, Alexander
    Dolenko, Sergey
    Persiantsev, Igor
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 373 - 380
  • [3] VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration
    Ginzburg, Samuel
    Shahrad, Mohammad
    Freedman, Michael J.
    PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 1017 - 1037
  • [4] High throughput acceleration of NIST lightweight authenticated encryption schemes on GPU platform
    Chan, Jia-Lin
    Lee, Wai-Kong
    Wong, Denis C. -K.
    Yap, Wun-She
    Ooi, Boon-Yaik
    Goi, Bok-Min
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 11213 - 11235
  • [5] High Throughput Neural Network based Embedded Streaming Multicore Processors
    Hasan, Raqibul
    Taha, Tarek M.
    Yakopcic, Chris
    Mountain, David J.
    2016 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2016,
  • [6] Collective behavior of large-scale neural networks with GPU acceleration
    Qu, Jingyi
    Wang, Rubin
    COGNITIVE NEURODYNAMICS, 2017, 11 (06) : 553 - 563
  • [7] Collective behavior of large-scale neural networks with GPU acceleration
    Jingyi Qu
    Rubin Wang
    Cognitive Neurodynamics, 2017, 11 : 553 - 563
  • [8] Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection
    Li, Shijie
    Dou, Yong
    Lv, Qi
    Wang, Qiang
    Niu, Xin
    Yang, Ke
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 224 - 230
  • [9] Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing
    Wang, Zhenning
    Yang, Jun
    Melhem, Rami
    Childers, Bruce
    Zhang, Youtao
    Guo, Minyi
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 358 - 369
  • [10] Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors
    Han, Bing
    Taha, Tarek M.
    APPLIED OPTICS, 2010, 49 (10) : B83 - B91