FSGraph: fast and scalable implementation of graph traversal on GPUs

被引：1

作者：

Zhang, Yuan ^{[1
,2
]}

Cao, Huawei ^{[1
,3
]}

Liang, Yan ^{[1
,2
]}

Zhang, Jie ^{[1
,2
]}

Huang, Junying ^{[1
]}

Ye, Xiaochun ^{[1
]}

An, Xuejun ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Univ Chinese Acad Sci, Nanjing 211135, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2023年 / 5卷 / 03期

基金：

北京市自然科学基金;

关键词：

BFS; GPU-friendly CSR structure; Bidirectional 1d partition; UM-aware communication; ALGORITHMS;

D O I：

10.1007/s42514-023-00155-x

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communicationto-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44x improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.

引用

页码：277 / 291

页数：15

共 50 条

[1] FSGraph: fast and scalable implementation of graph traversal on GPUs
Yuan Zhang
Huawei Cao
Yan Liang
Jie Zhang
Junying Huang
Xiaochun Ye
Xuejun An
CCF Transactions on High Performance Computing, 2023, 5 : 277 - 291
[2] Scalable GPU Graph Traversal
Merrill, Duane
Garland, Michael
Grimshaw, Andrew
ACM SIGPLAN NOTICES, 2012, 47 (08) : 117 - 127
[3] Self-adaptive Graph Traversal on GPUs
Sha, Mo
Li, Yuchen
Tan, Kian-Lee
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1558 - 1570
[4] Scalable Graph Sampling on GPUs with Compressed Graph
Yin, Hongbo
Shao, Yingxia
Miao, Xupeng
Li, Yawen
Cui, Bin
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2383 - 2392
[5] Enterprise: Breadth-First Graph Traversal on GPUs
Liu, Hang
Huang, H. Howie
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[6] GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs
Kim, Min-Soo
An, Kyuhyeon
Park, Himchan
Seo, Hyunseok
Kim, Jinwook
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 447 - 461
[7] Efficient Load Balancing Techniques for Graph Traversal Applications on GPUs
Busato, Federico
Bombieri, Nicola
EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 628 - 641
[8] SRelation: Fast RDF Graph Traversal
Mojzis, Jan
Laclavik, Michal
KNOWLEDGE ENGINEERING AND THE SEMANTIC WEB (KESW 2013), 2013, 394 : 69 - 82
[9] Scalable and Fast Lazy Persistency on GPUs
Yudha, Ardhi Wiratama Baskara
Kimura, Keiji
Zhou, Huiyang
Solihin, Yan
2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 252 - 263
[10] Efficient and Scalable Graph Pattern Mining on GPUs
Chen, Xuhao
Arvind
PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 857 - 877

← 1 2 3 4 5 →