Neural Acceleration for GPU Throughput Processors

被引:56
|
作者
Yazdanbakhsh, Amir [1 ]
Park, Jongse [1 ]
Sharma, Hardik [1 ]
Lotfi-Kamran, Pejman [2 ]
Esmaeilzadeh, Hadi [1 ]
机构
[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
基金
美国国家科学基金会;
关键词
Approximate computing; GPU; neural processing unit;
D O I
10.1145/2830772.2830810
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.
引用
收藏
页码:482 / 493
页数:12
相关论文
共 50 条
  • [31] A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA
    Yin, Shouyi
    Tang, Shibin
    Lin, Xinhan
    Ouyang, Peng
    Tu, Fengbin
    Liu, Leibo
    Wei, Shaojun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (04) : 678 - 691
  • [32] Design and characterization of superconducting nanowire-based processors for acceleration of deep neural network training
    Onen, Murat
    Butters, Brenden A.
    Toomey, Emily
    Gokmen, Tayfun
    Berggren, Karl K.
    NANOTECHNOLOGY, 2020, 31 (02)
  • [33] Compiling High Throughput Network Processors
    Lavasani, Maysam
    Dennison, Larry
    Chiou, Derek
    FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2012, : 87 - 96
  • [34] RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images
    Ponsard, Raphael
    Janvier, Nicolas
    Kieffer, Jerome
    Houzet, Dominique
    Fristot, Vincent
    JOURNAL OF SYNCHROTRON RADIATION, 2020, 27 : 1297 - 1306
  • [35] GREATER THROUGHPUT WITH MULTIPLE ARRAY PROCESSORS
    BURNS, JF
    COMPUTER DESIGN, 1981, 20 (09): : 207 - 211
  • [36] THROUGHPUT ANALYSIS OF PIPELINED PROTOCOL PROCESSORS
    CARDONA, M
    SATAKE, T
    TSUJII, S
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART I-COMMUNICATIONS, 1994, 77 (04): : 79 - 91
  • [37] An instruction throughput model of superscalar processors
    Taha, Tarek M.
    Wills, D. Scott
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (03) : 389 - 403
  • [38] An instruction throughput model of superscalar processors
    Taha, TM
    Wills, DS
    14TH IEEE INTERNATIONAL WORKSHOP ON RAPID SYSTEMS PROTOTYPING, PROCEEDINGS: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2003, : 156 - 163
  • [39] Throughput analysis of pipelined protocol processors
    Cardona, Mario
    Satake, Tadashi
    Tsujii, Shigeo
    Electronics and Communications in Japan, Part I: Communications (English translation of Denshi Tsushin Gakkai Ronbunshi), 1994, 77 (04): : 79 - 91
  • [40] Poster Abstract: CNN-guardian: Secure Neural Network Inference Acceleration on Edge GPU
    Xie, Qipeng
    Yang, Hao
    Jiang, Linshan
    Zhao, Zhihe
    Jiang, Siyang
    Shen, Shiyu
    Khan, Salabat
    Liu, Zhe
    Wu, Kaishun
    PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023, 2023, : 524 - 525