Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

被引：247

作者：

Nurvitadhi, Eriko ^{[1
]}

Venkatesh, Ganesh ^{[1
]}

Sim, Jaewoong ^{[1
]}

Marr, Debbie ^{[1
]}

Huang, Randy ^{[2
]}

Ong, Jason Gee Hock ^{[2
]}

Liew, Yeong Tat ^{[2
]}

Srivatsan, Krishnan ^{[3
]}

Moss, Duncan ^{[3
]}

Subhaschandra, Suchit ^{[3
]}

Boudoukh, Guy ^{[4
]}

机构：

[1] Intel Corp, Accelerator Architecture Lab, Santa Clara, CA 95051 USA

[2] Intel Corp, Programmable Solut Grp, Santa Clara, CA 95051 USA

[3] Intel Corp, FPGA Prod Team, Santa Clara, CA 95051 USA

[4] Intel Corp, Comp Vis Grp, Santa Clara, CA 95051 USA

来源：

FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS | 2017年

关键词：

Deep Learning; Accelerator; Intel Stratix 10 FPGA; GPU;

D O I：

10.1145/3020078.3021740

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for accelerating DNNs. Current FPGAs offer superior energy efficiency (Ops/Watt), but they do not offer the performance of today's GPUs on DNNs. In this paper, we look at upcoming FPGA technology advances, the rapid pace of innovation in DNN algorithms, and consider whether future high-performance FPGAs will outperform GPUs for next-generation DNNs. The upcoming Intel (R) 14-nm Stratix (TM) 10 FPGAs will have thousands of hard floating-point units (DSPs) and on-chip RAMs (M20K memory blocks). They will also have high bandwidth memories (HBMs) and improved frequency (HyperFlex (TM) core architecture). This combination of features brings FPGA raw floating point performance within striking distance of GPUs. Meanwhile, DNNs are quickly evolving. For example, recent innovations that exploit sparsity (e.g., pruning) and compact data types (e.g., 1-2 bit) result in major leaps in algorithmic efficiency. However, these innovations introduce irregular parallelism on custom data types, which are difficult for GPUs to handle but would be a great fit for FPGA's extreme customizability. This paper evaluates a selection of emerging DNN algorithms on two generations of Intel FPGAs (Arria (TM) 10, Stratix (TM) 10) against the latest highest performance Titan X Pascal GPU. We created a customizable DNN accelerator template for FPGAs and used it in our evaluations. First, we study various GEMM operations for next-generation DNNs. Our results show that Stratix 10 FPGA is 10%, 50%, and 5.4x better in performance (TOP/sec) than Titan X Pascal GPU on GEMM operations for pruned, Int6, and binarized DNNs, respectively. Then, we present a detailed case study on accelerating Ternary ResNet which relies on sparse GEMM on 2-bit weights (i.e., weights constrained to 0,+1,-1) and full-precision neurons. The Ternary ResNet accuracy is within similar to 1% of the full-precision ResNet which won the 2015 ImageNet competition. On Ternary-ResNet, the Stratix 10 FPGA can deliver 60% better performance over Titan X Pascal GPU, while being 2.3x better in performance/watt. Our results indicate that FPGAs may become the platform of choice for accelerating next-generation DNNs.

引用

页码：5 / 14

页数：10

共 50 条

[1] Detailed Characterization of Deep Neural Networks on GPUs and FPGAs
Karki, Aajna
Keshava, Chethan Palangotu
Shivakumar, Spoorthi Mysore
Skow, Joshua
Hegde, Goutam Madhukeshwar
Jeon, Hyeran
[J]. 12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), 2019, : 12 - 21
[2] Accelerating Sparse Deep Neural Networks on FPGAs
Huang, Sitao
Pearson, Carl
Nagi, Rakesh
Xiong, Jinjun
Chen, Deming
Hwu, Wen-mei
[J]. 2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[3] Accelerating Deep Neural Networks Using FPGAs and ZYNQ
Lee, Han Sung
Jeon, Jae Wook
[J]. 2021 IEEE REGION 10 SYMPOSIUM (TENSYMP), 2021,
[4] TensorFlow to Cloud FPGAs: Tradeoffs for Accelerating Deep Neural Networks
Hadjis, Stefan
Olukotun, Kunle
[J]. 2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 360 - 366
[5] Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
Xiao, Qincheng
Liang, Yun
Lu, Liqiang
Yan, Shengen
Tai, Yu-Wing
[J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[6] Deep reinforcement learning for next-generation IoT networks
Garg, Sahil
Hu, Jia
Fortino, Giancarlo
Yang, Laurence T.
Guizani, Mohsen
Deng, Xianjun
Rawat, Danda B.
[J]. COMPUTER NETWORKS, 2023, 228
[7] Neural Networks and Graph Algorithms with Next-Generation Processors
Hamilton, Kathleen E.
Schuman, Catherine D.
Young, Steven R.
Imam, Neena
Humble, Travis S.
[J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 1194 - 1203
[8] Next-generation networks
Dowden, DC
Gitlin, RD
Martin, RL
[J]. BELL LABS TECHNICAL JOURNAL, 1998, 3 (04) : 3 - 14
[9] Accelerating the Design and Performance of Next Generation Computing Systems with GPUs
Halepete, Sameer
[J]. ISPD'22: PROCEEDINGS OF THE 2022 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, 2022, : 149 - 149
[10] IMPLEMENTATION OF EFFICIENT, LOW POWER DEEP NEURAL NETWORKS ON NEXT-GENERATION INTEL CLIENT PLATFORMS
Deisher, Michael
Polonski, Andrzej
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6590 - 6590

← 1 2 3 4 5 →