Distributed Recommendation Inference on FPGA Clusters

被引:9
|
作者
Zhu, Yu [1 ]
He, Zhenhao [1 ]
Jiang, Wenqi [1 ]
Zeng, Kai [2 ]
Zhou, Jingren [2 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Alibaba Grp, Hangzhou, Peoples R China
来源
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021) | 2021年
关键词
D O I
10.1109/FPL53798.2021.00057
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are widely used in personalized recommendation systems. Such models involve two major components: the memory-bound embedding layer and the computation-bound fully-connected layers. Existing solutions are either slow on both stages or only optimize one of them. To implement recommendation inference efficiently in the context of a real deployment, we design and implement an FPGA cluster optimizing the performance of both stages. To remove the memory bottleneck, we take advantage of the High-Bandwidth Memory (HBM) available on the latest FPGAs for highly concurrent embedding table lookups. To match the required DNN computation throughput, we partition the workload across multiple FPGAs interconnected via a 100 Gbps TCP/IP network. Compared to an optimized CPU baseline (16 vCPU, AVX2-enabled) and a one-node FPGA implementation, our system (four-node version) achieves 28.95x and 7.68x speedup in terms of throughput respectively. The proposed system also guarantees a latency of tens of microseconds per single inference, significantly better than CPU and GPU-based systems which take at least milliseconds.
引用
收藏
页码:279 / 285
页数:7
相关论文
共 50 条
  • [41] Leveraging FPGA clusters for SAT computations
    Kouril, Michal
    PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 : 525 - 532
  • [42] Tinsel: a manythread overlay for FPGA clusters
    Naylor, Matthew
    Moore, Simon W.
    Thomas, David
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 375 - 383
  • [43] Accelerating Large Scale GCN Inference on FPGA
    Zhang, Bingyi
    Zeng, Hanqing
    Prasanna, Viktor
    28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 241 - 241
  • [44] LUTNet: Rethinking Inference in FPGA Soft Logic
    Wang, Erwei
    Davis, James J.
    Cheung, Peter Y. K.
    Constantinides, George A.
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 26 - 34
  • [45] BoostGCN: A Framework for Optimizing GCN Inference on FPGA
    Zhang, Bingyi
    Kannan, Rajgopal
    Prasanna, Viktor
    2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 29 - 39
  • [46] Exact inference for family disease clusters
    Yu, C
    Zelterman, D
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2001, 30 (11) : 2293 - 2305
  • [47] Performance Modeling for CNN Inference Accelerators on FPGA
    Ma, Yufei
    Cao, Yu
    Vrudhula, Sarma
    Seo, Jae-Sun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (04) : 843 - 856
  • [48] SPATIAL DISEASE CLUSTERS - DETECTION AND INFERENCE
    KULLDORFF, M
    NAGARWALLA, N
    STATISTICS IN MEDICINE, 1995, 14 (08) : 799 - 810
  • [49] Statistical inference for familial disease clusters
    Yu, C
    Zelterman, D
    BIOMETRICS, 2002, 58 (03) : 481 - 491
  • [50] Fast and reliable inference of semantic clusters
    Fiorini, Nicolas
    Harispe, Sebastien
    Ranwez, Sylvie
    Montmain, Jacky
    Ranwez, Vincent
    KNOWLEDGE-BASED SYSTEMS, 2016, 111 : 133 - 143