Distributed Recommendation Inference on FPGA Clusters

被引:9
|
作者
Zhu, Yu [1 ]
He, Zhenhao [1 ]
Jiang, Wenqi [1 ]
Zeng, Kai [2 ]
Zhou, Jingren [2 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Alibaba Grp, Hangzhou, Peoples R China
来源
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021) | 2021年
关键词
D O I
10.1109/FPL53798.2021.00057
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are widely used in personalized recommendation systems. Such models involve two major components: the memory-bound embedding layer and the computation-bound fully-connected layers. Existing solutions are either slow on both stages or only optimize one of them. To implement recommendation inference efficiently in the context of a real deployment, we design and implement an FPGA cluster optimizing the performance of both stages. To remove the memory bottleneck, we take advantage of the High-Bandwidth Memory (HBM) available on the latest FPGAs for highly concurrent embedding table lookups. To match the required DNN computation throughput, we partition the workload across multiple FPGAs interconnected via a 100 Gbps TCP/IP network. Compared to an optimized CPU baseline (16 vCPU, AVX2-enabled) and a one-node FPGA implementation, our system (four-node version) achieves 28.95x and 7.68x speedup in terms of throughput respectively. The proposed system also guarantees a latency of tens of microseconds per single inference, significantly better than CPU and GPU-based systems which take at least milliseconds.
引用
收藏
页码:279 / 285
页数:7
相关论文
共 50 条
  • [31] Topology for distributed inference on graphs
    Kar, Soummya
    Aldosari, Saeed
    Moura, Jose M. F.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (06) : 2609 - 2613
  • [32] DISTRIBUTED INFERENCE IN BAYESIAN NETWORKS
    DIEZ, FJ
    MIRA, J
    CYBERNETICS AND SYSTEMS, 1994, 25 (01) : 39 - 61
  • [33] DISTRIBUTED INFERENCE FOR PLAUSIBLE CLASSIFICATION
    KIM, JH
    PATTERN RECOGNITION LETTERS, 1987, 5 (03) : 195 - 201
  • [34] A review of distributed statistical inference
    Gao, Yuan
    Liu, Weidong
    Wang, Hansheng
    Wang, Xiaozhou
    Yan, Yibo
    Zhang, Riquan
    STATISTICAL THEORY AND RELATED FIELDS, 2022, 6 (02) : 89 - 99
  • [35] Type inference for a distributed π-calculus
    Lhoussaine, C
    SCIENCE OF COMPUTER PROGRAMMING, 2004, 50 (1-3) : 225 - 251
  • [36] Inference in distributed data clustering
    da Silva, Josenildo Costa
    Klusch, Matthias
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2006, 19 (04) : 363 - 369
  • [37] Inference on distributed data clustering
    da Silva, JC
    Klusch, M
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3587 : 610 - 619
  • [38] Type inference for a distributed π-calculus
    Lhoussaine, C
    PROGRAMMING LANGUAGES AND SYSTEMS, 2003, 2618 : 253 - 268
  • [39] A COMPILER FOR A DISTRIBUTED INFERENCE MODEL
    PERCEBOIS, C
    SIGNES, N
    AGNOLETTO, P
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 487 : 412 - 421
  • [40] Minimax Learning for Distributed Inference
    Li, Cheuk Ting
    Wu, Xiugang
    Ozgur, Ayfer
    El Gamal, Abbas
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (12) : 7929 - 7938