xDNN: Inference for Deep Convolutional Neural Networks

被引：4

作者：

D'Alberto, Paolo ^{[1
]}

Wu, Victor ^{[1
]}

Ng, Aaron ^{[1
]}

Nimaiyar, Rahul ^{[1
]}

Delaye, Elliott ^{[1
]}

Sirasao, Ashish ^{[2
]}

机构：

[1] Xilinx, Log Dr, San Jose, CA 95124 USA

[2] FaceBook, 1 Hacker Way, Menlo Pk, CA 94025 USA

来源：

ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS | 2022年 / 15卷 / 02期

关键词：

Al inference; low latency; high efficiency; custom architectures; optimizations;

D O I：

10.1145/3473334

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224 x 224 to 2048 x 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

引用

页数：29

共 50 条

[41] Review of Lightweight Deep Convolutional Neural Networks
Fanghui Chen
Shouliang Li
Jiale Han
Fengyuan Ren
Zhen Yang
[J]. Archives of Computational Methods in Engineering, 2024, 31 : 1915 - 1937
[42] Elastography mapped by deep convolutional neural networks
DongXu Liu
Frithjof Kruggel
LiZhi Sun
[J]. Science China Technological Sciences, 2021, 64 : 1567 - 1574
[43] Energy Propagation in Deep Convolutional Neural Networks
Wiatowski, Thomas
Grohs, Philipp
Boelcskei, Helmut
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (07) : 4819 - 4842
[44] Deep Convolutional Neural Networks for DGA Detection
Catania, Carlos
Garcia, Sebastian
Torres, Pablo
[J]. COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 327 - 340
[45] Deep Parametric Continuous Convolutional Neural Networks
Wang, Shenlong
Suo, Simon
Ma, Wei-Chiu
Pokrovsky, Andrei
Urtasun, Raquel
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2589 - 2597
[46] Elastography mapped by deep convolutional neural networks
LIU DongXu
KRUGGEL Frithjof
SUN LiZhi
[J]. Science China Technological Sciences, 2021, 64 (07) : 1567 - 1574
[47] Refining Architectures of Deep Convolutional Neural Networks
Shankar, Sukrit
Robertson, Duncan
Ioannou, Yani
Criminisi, Antonio
Cipolla, Roberto
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2212 - 2220
[48] Very Deep Convolutional Neural Networks for LVCSR
Bi, Mengxiao
Qian, Yanmin
Yu, Kai
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3259 - 3263
[49] WEATHER CLASSIFICATION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS
Elhoseiny, Mohamed
Huang, Sheng
Elgammal, Ahmed
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3349 - 3353
[50] Universal Consistency of Deep Convolutional Neural Networks
Lin, Shao-Bo
Wang, Kaidong
Wang, Yao
Zhou, Ding-Xuan
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (07) : 4610 - 4617

← 1 2 3 4 5 →