xDNN: Inference for Deep Convolutional Neural Networks

被引:4
|
作者
D'Alberto, Paolo [1 ]
Wu, Victor [1 ]
Ng, Aaron [1 ]
Nimaiyar, Rahul [1 ]
Delaye, Elliott [1 ]
Sirasao, Ashish [2 ]
机构
[1] Xilinx, Log Dr, San Jose, CA 95124 USA
[2] FaceBook, 1 Hacker Way, Menlo Pk, CA 94025 USA
关键词
Al inference; low latency; high efficiency; custom architectures; optimizations;
D O I
10.1145/3473334
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224 x 224 to 2048 x 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Towards explainable deep neural networks (xDNN)
    Angelov, Plamen
    Soares, Eduardo
    [J]. NEURAL NETWORKS, 2020, 130 : 185 - 194
  • [2] Deep inference: A Convolutional Neural Networks Method for Parameter Recovery of the Fractional Dynamics
    Biranvand, N.
    Hadian-Rasanan, A. H.
    Khalili, A.
    Rad, J. A.
    [J]. INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2021, 12 (01): : 189 - 201
  • [3] Deep Convolutional Neural Networks
    Gonzalez, Rafael C.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (06) : 79 - 87
  • [4] Simulating quantized inference on convolutional neural networks
    Finotti, Vitor
    Albertini, Bruno
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2021, 95
  • [5] Simulating quantized inference on convolutional neural networks
    Finotti, Vitor
    Albertini, Bruno
    [J]. Computers and Electrical Engineering, 2021, 95
  • [6] FPGA based Flexible Implementation of Light Weight Inference on Deep Convolutional Neural Networks
    Dawwd, Shefa
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (03) : 408 - 417
  • [7] Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits
    Passalis, Nikolaos
    Raitoharju, Jenni
    Tefas, Anastasios
    Gabbouj, Moncef
    [J]. PATTERN RECOGNITION, 2020, 105
  • [8] DEEP NEURAL NETWORKS FOR ESTIMATION AND INFERENCE
    Farrell, Max H.
    Liang, Tengyuan
    Misra, Sanjog
    [J]. ECONOMETRICA, 2021, 89 (01) : 181 - 213
  • [9] Property Inference for Deep Neural Networks
    Gopinath, Divya
    Converse, Hayes
    Pasareanu, Corina S.
    Taly, Ankur
    [J]. 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 809 - 821
  • [10] Deep Anchored Convolutional Neural Networks
    Huang, Jiahui
    Dwivedi, Kshitij
    Roig, Gemma
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 639 - 647