PGLBox: Multi-GPU Graph Learning Framework for Web-Scale Recommendation

被引：1

作者：

Jiao, Xuewu ^{[1
]}

Li, Weibin ^{[1
]}

Wu, Xinxuan ^{[1
]}

Hu, Wei ^{[1
]}

Li, Miao ^{[1
]}

Bian, Jiang ^{[1
]}

Dai, Siming ^{[1
]}

Luo, Xinsheng ^{[1
]}

Hu, Mingqing ^{[1
]}

Huang, Zhengjie ^{[1
]}

Feng, Danlei ^{[1
]}

Yang, Junchao ^{[1
]}

Feng, Shikun ^{[1
]}

Xiong, Haoyi ^{[1
]}

Yu, Dianhai ^{[1
]}

Li, Shuanglong ^{[1
]}

He, Jingzhou ^{[1
]}

Ma, Yanjun ^{[1
]}

Liu, Lin ^{[1
]}

机构：

[1] Baidu Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

关键词：

Graph learning; GNN; GPU graph engine; Hierarchical storage;

D O I：

10.1145/3580305.3599885

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While having been used widely for large-scale recommendation and online advertising, the Graph Neural Network (GNN) has demonstrated its representation learning capacity to extract embeddings of nodes and edges through passing, transforming, and aggregating information over the graph. In this work, we propose PGLBox(1) - a multi-GPU graph learning framework based on PaddlePaddle [24], incorporating with optimized storage, computation, and communication strategies, to train deep GNNs based on web-scale graphs for the recommendation. Specifically, PGLBox adopts a hierarchical storage system with three layers to facilitate I/O, where graphs and embeddings are stored in the HBMs and SSDs, respectively, with MEMs as the cache. To fully utilize multi-GPUs and I/O bandwidth, PGLBox proposes an asynchronous pipeline with three stages it first samples the subgraphs from the input graph, then pulls & updates embeddings and trains GNNs on the subgraph with parameters updating queued at the end of the pipeline. Thanks to the capacity of PGLBox in handling web-scale graphs, it becomes feasible to unify the view of GNN-based recommendation tasks for multiple advertising verticals and fuse all these graphs into a unified yet huge one. We evaluate PGLBox using a bucket of realistic GNN training tasks for the recommendation, and compare the performance of PGLBox on top of a multi-GPU server (Tesla A100x8) and the legacy training system based on a 40-node MPI cluster at Baidu. The overall comparisons show that PGLBox could save up to 55% monetary cost for training GNN models, and achieve up to 14x training speedup with the same accuracy as the legacy trainer. The open-source implementation of PGLBox is available at https://github.com/PaddlePaddle/PGL/tree/main/apps/PGLBox.

引用

页码：4262 / 4272

页数：11

共 50 条

[1] Multi-GPU Graph Analytics
Pan, Yuechao
Wang, Yangzihao
Wu, Yuduo
Yang, Carl
Owens, John D.
[J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 479 - 490
[2] Fast STA Graph Partitioning Framework for Multi-GPU Acceleration
Guo, Guannan
Huang, Tsung-Wei
Wong, Martin
[J]. 2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[3] Large-Scale Graph Processing on Multi-GPU Platforms
Zhang H.
Zhang L.
Wu Y.
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (02): : 273 - 288
[4] Web-Scale Media Recommendation Systems
Dror, Gideon
Koenigstein, Noam
Koren, Yehuda
[J]. PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
[5] Efficient Large-scale Deep Learning Framework for Heterogeneous Multi-GPU Cluster
Kim, Youngrang
Choi, Hyeonseong
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
[J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 176 - 181
[6] Lion: A GPU-Accelerated Online Serving System for Web-Scale Recommendation at Baidu
Liu, Hao
Gao, Qian
Liao, Xiaochao
Chen, Guangxing
Xiong, Hao
Ren, Silin
Yang, Guobao
Zha, Zhiwei
[J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3388 - 3397
[7] Task-based Recommendation on a Web-Scale
Zhang, Yongfeng
Zhang, Min
Liu, Yiqun
Tat-Seng, Chua
Zhang, Yi
Ma, Shaoping
[J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 827 - 836
[8] M2GRL: A Multi-task Multi-view Graph Representation Learning Framework for Web-scale Recommender Systems
Wang, Menghan
Lin, Yujie
Lin, Guli
Yang, Keping
Wu, Xiao-ming
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2349 - 2358
[9] Moim: A Multi-GPU MapReduce Framework
Xie, Mengjun
Kang, Kyoung-Don
Basaran, Can
[J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1279 - 1286
[10] Learning Query and Document Relevance from a Web-scale Click Graph
Jiang, Shan
Hu, Yuening
Kang, Changsung
Daly, Tim, Jr.
Yin, Dawei
Chang, Yi
Zhai, Chengxiang
[J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 185 - 194

← 1 2 3 4 5 →