Distributed Recommendation Inference on FPGA Clusters

被引:9
|
作者
Zhu, Yu [1 ]
He, Zhenhao [1 ]
Jiang, Wenqi [1 ]
Zeng, Kai [2 ]
Zhou, Jingren [2 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Syst Grp, Zurich, Switzerland
[2] Alibaba Grp, Hangzhou, Peoples R China
关键词
D O I
10.1109/FPL53798.2021.00057
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are widely used in personalized recommendation systems. Such models involve two major components: the memory-bound embedding layer and the computation-bound fully-connected layers. Existing solutions are either slow on both stages or only optimize one of them. To implement recommendation inference efficiently in the context of a real deployment, we design and implement an FPGA cluster optimizing the performance of both stages. To remove the memory bottleneck, we take advantage of the High-Bandwidth Memory (HBM) available on the latest FPGAs for highly concurrent embedding table lookups. To match the required DNN computation throughput, we partition the workload across multiple FPGAs interconnected via a 100 Gbps TCP/IP network. Compared to an optimized CPU baseline (16 vCPU, AVX2-enabled) and a one-node FPGA implementation, our system (four-node version) achieves 28.95x and 7.68x speedup in terms of throughput respectively. The proposed system also guarantees a latency of tens of microseconds per single inference, significantly better than CPU and GPU-based systems which take at least milliseconds.
引用
收藏
页码:279 / 285
页数:7
相关论文
共 50 条
  • [1] FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters
    Jiang, Wenqi
    He, Zhenhao
    Zhang, Shuai
    Zeng, Kai
    Feng, Liang
    Zhang, Jiansong
    Liu, Tongxuan
    Li, Yong
    Zhou, Jingren
    Zhang, Ce
    Alonso, Gustavo
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3097 - 3105
  • [2] Challenges using FPGA Clusters for Distributed CNN Training
    Kreowsky, Philipp
    Knapheide, Justin
    Stabernack, Benno
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 347 - 348
  • [3] An Approach Towards Distributed DNN Training on FPGA Clusters
    Kreowsky, Philipp
    Knapheide, Justin
    Stabernack, Benno
    ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 18 - 32
  • [4] Inference of global clusters from locally distributed data
    Nguyen, XuanLong
    BAYESIAN ANALYSIS, 2010, 5 (04): : 817 - 845
  • [5] Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles
    Owaida, Muhsen
    Alonso, Gustavo
    2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, : 295 - 300
  • [6] Demonstrating NADA: A Workflow for Distributed CNN Training on FPGA Clusters
    Knapheide, Justin
    Kreowsky, Philipp
    Stabernack, Benno
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 363 - 363
  • [7] Automated parallel execution of distributed task graphs with FPGA clusters
    Ruiz, Juan Miguel de Haro
    Martinez, Carlos alvarez
    Jimenez-Gonzalez, Daniel
    Martorell, Xavier
    Ueno, Tomohiro
    Sano, Kentaro
    Ringlein, Burkhard
    Abel, Francois
    Weiss, Beat
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 808 - 824
  • [8] Distributed Inference over Decision Tree Ensembles on Clusters of FPGAs
    Owaida, Muhsen
    Kulkarni, Amit
    Alonso, Gustavo
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2019, 12 (04)
  • [9] Scaling up Bayesian variational inference using distributed computing clusters
    Masegosa, Andres R.
    Martinez, Ana M.
    Langseth, Helge
    Nielsen, Thomas D.
    Salmeron, Antonio
    Ramos-Lopez, Dario
    Madsen, Anders L.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2017, 88 : 435 - 451
  • [10] DISSEC: A distributed deep neural network inference scheduling strategy for edge clusters
    Li, Qiang
    Huang, Liang
    Tong, Zhao
    Du, Ting-Ting
    Zhang, Jin
    Wang, Sheng-Chun
    NEUROCOMPUTING, 2022, 500 (449-460) : 449 - 460