Neural Acceleration for GPU Throughput Processors

被引：56

作者：

Yazdanbakhsh, Amir ^{[1
]}

Park, Jongse ^{[1
]}

Sharma, Hardik ^{[1
]}

Lotfi-Kamran, Pejman ^{[2
]}

Esmaeilzadeh, Hadi ^{[1
]}

机构：

[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran

来源：

PROCEEDINGS OF THE 48TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-48) | 2015年

基金：

美国国家科学基金会;

关键词：

Approximate computing; GPU; neural processing unit;

D O I：

10.1145/2830772.2830810

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains when augmenting CPU processors. However, the integration of neural accelerators within a GPU processor has remained unexplored. GPUs are, in a sense, many-core accelerators that exploit large degrees of data-level parallelism in the applications through the SIMT execution model. This paper aims to harmoniously bring neural and GPU accelerators together without hindering SIMT execution or adding excessive hardware overhead. We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. This work also devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration. Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4x average speedup and a 2.8x average energy reduction within 10% quality loss margin across a diverse set of benchmarks. The proposed quality control mechanism retains a 1.9x average speedup and a 2.1x energy reduction while reducing the degradation in the quality of results to 2.5%. These benefits are achieved by less than 1% area overhead.

引用

页码：482 / 493

页数：12

共 50 条

[31] A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA
Yin, Shouyi
Tang, Shibin
Lin, Xinhan
Ouyang, Peng
Tu, Fengbin
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (04) : 678 - 691
[32] Design and characterization of superconducting nanowire-based processors for acceleration of deep neural network training
Onen, Murat
Butters, Brenden A.
Toomey, Emily
Gokmen, Tayfun
Berggren, Karl K.
NANOTECHNOLOGY, 2020, 31 (02)
[33] Compiling High Throughput Network Processors
Lavasani, Maysam
Dennison, Larry
Chiou, Derek
FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2012, : 87 - 96
[34] RDMA data transfer and GPU acceleration methods for high-throughput online processing of serial crystallography images
Ponsard, Raphael
Janvier, Nicolas
Kieffer, Jerome
Houzet, Dominique
Fristot, Vincent
JOURNAL OF SYNCHROTRON RADIATION, 2020, 27 : 1297 - 1306
[35] GREATER THROUGHPUT WITH MULTIPLE ARRAY PROCESSORS
BURNS, JF
COMPUTER DESIGN, 1981, 20 (09): : 207 - 211
[36] THROUGHPUT ANALYSIS OF PIPELINED PROTOCOL PROCESSORS
CARDONA, M
SATAKE, T
TSUJII, S
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART I-COMMUNICATIONS, 1994, 77 (04): : 79 - 91
[37] An instruction throughput model of superscalar processors
Taha, Tarek M.
Wills, D. Scott
IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (03) : 389 - 403
[38] An instruction throughput model of superscalar processors
Taha, TM
Wills, DS
14TH IEEE INTERNATIONAL WORKSHOP ON RAPID SYSTEMS PROTOTYPING, PROCEEDINGS: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2003, : 156 - 163
[39] Throughput analysis of pipelined protocol processors
Cardona, Mario
Satake, Tadashi
Tsujii, Shigeo
Electronics and Communications in Japan, Part I: Communications (English translation of Denshi Tsushin Gakkai Ronbunshi), 1994, 77 (04): : 79 - 91
[40] Poster Abstract: CNN-guardian: Secure Neural Network Inference Acceleration on Edge GPU
Xie, Qipeng
Yang, Hao
Jiang, Linshan
Zhao, Zhihe
Jiang, Siyang
Shen, Shiyu
Khan, Salabat
Liu, Zhe
Wu, Kaishun
PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023, 2023, : 524 - 525

← 1 2 3 4 5 →