FSGraph: fast and scalable implementation of graph traversal on GPUs

被引：1

作者：

Zhang, Yuan ^{[1
,2
]}

Cao, Huawei ^{[1
,3
]}

Liang, Yan ^{[1
,2
]}

Zhang, Jie ^{[1
,2
]}

Huang, Junying ^{[1
]}

Ye, Xiaochun ^{[1
]}

An, Xuejun ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Univ Chinese Acad Sci, Nanjing 211135, Peoples R China

来源：

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING | 2023年 / 5卷 / 03期

基金：

北京市自然科学基金;

关键词：

BFS; GPU-friendly CSR structure; Bidirectional 1d partition; UM-aware communication; ALGORITHMS;

D O I：

10.1007/s42514-023-00155-x

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communicationto-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44x improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.

引用

页码：277 / 291

页数：15

共 50 条

[31] CPU-Style SIMD Ray Traversal on GPUs
Lier, Alexander
Stamminger, Marc
Selgrad, Kai
HIGH-PERFORMANCE GRAPHICS 2018, 2018,
[32] Parallel graph traversal for FPGA
Ni, Shice
Dou, Yong
Zou, Dan
Li, Rongchun
Wang, Qiang
IEICE ELECTRONICS EXPRESS, 2014, 11 (07):
[33] On external memory graph traversal
Buchsbaum, AL
Goldwasser, M
Venkatasubramanian, S
Westbrook, JR
PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 859 - 860
[34] Improved scalable hash chain traversal
Kim, SR
APPLIED CRYPTOGRAPHY AND NETWORK SECURITY, PROCEEDINGS, 2003, 2846 : 86 - 95
[35] Parallel Fast Walsh Transform Algorithm and Its Implementation with CUDA on GPUs
Bikov, Dusan
Bouyukliev, Iliya
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (05) : 21 - 43
[36] A High-Quality and Fast Maximal Independent Set Implementation for GPUs
Burtscher, Martin
Devale, Sindhu
Azimi, Sahar
Jaiganesh, Jayadharini
Powers, Evan
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (02)
[37] Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining
Yang, Xintian
Parthasarathy, Srinivasan
Sadayappan, P.
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (04): : 231 - 242
[38] A GPU-parallel Algorithm for Fast Hybrid BFS-DFS Graph Traversal
Maratea, Antonio
Marcellino, Livia
Duraccio, Vincenzo
2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 450 - 457
[39] MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs
Wen, Zeyi
Zhang, Rui
Ramamohanarao, Kotagiri
Qi, Jianzhong
Taylor, Kerry
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 580 - 589
[40] Fast scalable and low-power quantum circuit simulation on the cluster of GPUs platforms
Armin Ahmadzadeh
Hamid Sarbazi-Azad
Optical and Quantum Electronics, 56 (10)

← 1 2 3 4 5 →