PGLBox: Multi-GPU Graph Learning Framework for Web-Scale Recommendation

被引:1
|
作者
Jiao, Xuewu [1 ]
Li, Weibin [1 ]
Wu, Xinxuan [1 ]
Hu, Wei [1 ]
Li, Miao [1 ]
Bian, Jiang [1 ]
Dai, Siming [1 ]
Luo, Xinsheng [1 ]
Hu, Mingqing [1 ]
Huang, Zhengjie [1 ]
Feng, Danlei [1 ]
Yang, Junchao [1 ]
Feng, Shikun [1 ]
Xiong, Haoyi [1 ]
Yu, Dianhai [1 ]
Li, Shuanglong [1 ]
He, Jingzhou [1 ]
Ma, Yanjun [1 ]
Liu, Lin [1 ]
机构
[1] Baidu Inc, Beijing, Peoples R China
关键词
Graph learning; GNN; GPU graph engine; Hierarchical storage;
D O I
10.1145/3580305.3599885
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While having been used widely for large-scale recommendation and online advertising, the Graph Neural Network (GNN) has demonstrated its representation learning capacity to extract embeddings of nodes and edges through passing, transforming, and aggregating information over the graph. In this work, we propose PGLBox(1) - a multi-GPU graph learning framework based on PaddlePaddle [24], incorporating with optimized storage, computation, and communication strategies, to train deep GNNs based on web-scale graphs for the recommendation. Specifically, PGLBox adopts a hierarchical storage system with three layers to facilitate I/O, where graphs and embeddings are stored in the HBMs and SSDs, respectively, with MEMs as the cache. To fully utilize multi-GPUs and I/O bandwidth, PGLBox proposes an asynchronous pipeline with three stages it first samples the subgraphs from the input graph, then pulls & updates embeddings and trains GNNs on the subgraph with parameters updating queued at the end of the pipeline. Thanks to the capacity of PGLBox in handling web-scale graphs, it becomes feasible to unify the view of GNN-based recommendation tasks for multiple advertising verticals and fuse all these graphs into a unified yet huge one. We evaluate PGLBox using a bucket of realistic GNN training tasks for the recommendation, and compare the performance of PGLBox on top of a multi-GPU server (Tesla A100x8) and the legacy training system based on a 40-node MPI cluster at Baidu. The overall comparisons show that PGLBox could save up to 55% monetary cost for training GNN models, and achieve up to 14x training speedup with the same accuracy as the legacy trainer. The open-source implementation of PGLBox is available at https://github.com/PaddlePaddle/PGL/tree/main/apps/PGLBox.
引用
收藏
页码:4262 / 4272
页数:11
相关论文
共 50 条
  • [1] Multi-GPU Graph Analytics
    Pan, Yuechao
    Wang, Yangzihao
    Wu, Yuduo
    Yang, Carl
    Owens, John D.
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 479 - 490
  • [2] Fast STA Graph Partitioning Framework for Multi-GPU Acceleration
    Guo, Guannan
    Huang, Tsung-Wei
    Wong, Martin
    [J]. 2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [3] Large-Scale Graph Processing on Multi-GPU Platforms
    Zhang H.
    Zhang L.
    Wu Y.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (02): : 273 - 288
  • [4] Web-Scale Media Recommendation Systems
    Dror, Gideon
    Koenigstein, Noam
    Koren, Yehuda
    [J]. PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
  • [5] Efficient Large-scale Deep Learning Framework for Heterogeneous Multi-GPU Cluster
    Kim, Youngrang
    Choi, Hyeonseong
    Lee, Jaehwan
    Kim, Jik-Soo
    Jei, Hyunseung
    Roh, Hongchan
    [J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 176 - 181
  • [6] Lion: A GPU-Accelerated Online Serving System for Web-Scale Recommendation at Baidu
    Liu, Hao
    Gao, Qian
    Liao, Xiaochao
    Chen, Guangxing
    Xiong, Hao
    Ren, Silin
    Yang, Guobao
    Zha, Zhiwei
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3388 - 3397
  • [7] Task-based Recommendation on a Web-Scale
    Zhang, Yongfeng
    Zhang, Min
    Liu, Yiqun
    Tat-Seng, Chua
    Zhang, Yi
    Ma, Shaoping
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 827 - 836
  • [8] M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems
    Wang, Menghan
    Lin, Yujie
    Lin, Guli
    Yang, Keping
    Wu, Xiao-ming
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2349 - 2358
  • [9] Moim: A Multi-GPU MapReduce Framework
    Xie, Mengjun
    Kang, Kyoung-Don
    Basaran, Can
    [J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1279 - 1286
  • [10] Learning Query and Document Relevance from a Web-scale Click Graph
    Jiang, Shan
    Hu, Yuening
    Kang, Changsung
    Daly, Tim, Jr.
    Yin, Dawei
    Chang, Yi
    Zhai, Chengxiang
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 185 - 194