xDNN: Inference for Deep Convolutional Neural Networks

被引:4
|
作者
D'Alberto, Paolo [1 ]
Wu, Victor [1 ]
Ng, Aaron [1 ]
Nimaiyar, Rahul [1 ]
Delaye, Elliott [1 ]
Sirasao, Ashish [2 ]
机构
[1] Xilinx, Log Dr, San Jose, CA 95124 USA
[2] FaceBook, 1 Hacker Way, Menlo Pk, CA 94025 USA
关键词
Al inference; low latency; high efficiency; custom architectures; optimizations;
D O I
10.1145/3473334
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224 x 224 to 2048 x 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] Review of Lightweight Deep Convolutional Neural Networks
    Fanghui Chen
    Shouliang Li
    Jiale Han
    Fengyuan Ren
    Zhen Yang
    [J]. Archives of Computational Methods in Engineering, 2024, 31 : 1915 - 1937
  • [42] Elastography mapped by deep convolutional neural networks
    DongXu Liu
    Frithjof Kruggel
    LiZhi Sun
    [J]. Science China Technological Sciences, 2021, 64 : 1567 - 1574
  • [43] Energy Propagation in Deep Convolutional Neural Networks
    Wiatowski, Thomas
    Grohs, Philipp
    Boelcskei, Helmut
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (07) : 4819 - 4842
  • [44] Deep Convolutional Neural Networks for DGA Detection
    Catania, Carlos
    Garcia, Sebastian
    Torres, Pablo
    [J]. COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 327 - 340
  • [45] Deep Parametric Continuous Convolutional Neural Networks
    Wang, Shenlong
    Suo, Simon
    Ma, Wei-Chiu
    Pokrovsky, Andrei
    Urtasun, Raquel
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2589 - 2597
  • [46] Elastography mapped by deep convolutional neural networks
    LIU DongXu
    KRUGGEL Frithjof
    SUN LiZhi
    [J]. Science China Technological Sciences, 2021, 64 (07) : 1567 - 1574
  • [47] Refining Architectures of Deep Convolutional Neural Networks
    Shankar, Sukrit
    Robertson, Duncan
    Ioannou, Yani
    Criminisi, Antonio
    Cipolla, Roberto
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2212 - 2220
  • [48] Very Deep Convolutional Neural Networks for LVCSR
    Bi, Mengxiao
    Qian, Yanmin
    Yu, Kai
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3259 - 3263
  • [49] WEATHER CLASSIFICATION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS
    Elhoseiny, Mohamed
    Huang, Sheng
    Elgammal, Ahmed
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3349 - 3353
  • [50] Universal Consistency of Deep Convolutional Neural Networks
    Lin, Shao-Bo
    Wang, Kaidong
    Wang, Yao
    Zhou, Ding-Xuan
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (07) : 4610 - 4617