FSGraph: fast and scalable implementation of graph traversal on GPUs

被引:1
|
作者
Zhang, Yuan [1 ,2 ]
Cao, Huawei [1 ,3 ]
Liang, Yan [1 ,2 ]
Zhang, Jie [1 ,2 ]
Huang, Junying [1 ]
Ye, Xiaochun [1 ]
An, Xuejun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Nanjing 211135, Peoples R China
基金
北京市自然科学基金;
关键词
BFS; GPU-friendly CSR structure; Bidirectional 1d partition; UM-aware communication; ALGORITHMS;
D O I
10.1007/s42514-023-00155-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communicationto-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44x improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.
引用
收藏
页码:277 / 291
页数:15
相关论文
共 50 条
  • [31] CPU-Style SIMD Ray Traversal on GPUs
    Lier, Alexander
    Stamminger, Marc
    Selgrad, Kai
    HIGH-PERFORMANCE GRAPHICS 2018, 2018,
  • [32] Parallel graph traversal for FPGA
    Ni, Shice
    Dou, Yong
    Zou, Dan
    Li, Rongchun
    Wang, Qiang
    IEICE ELECTRONICS EXPRESS, 2014, 11 (07):
  • [33] On external memory graph traversal
    Buchsbaum, AL
    Goldwasser, M
    Venkatasubramanian, S
    Westbrook, JR
    PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 859 - 860
  • [34] Improved scalable hash chain traversal
    Kim, SR
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY, PROCEEDINGS, 2003, 2846 : 86 - 95
  • [35] Parallel Fast Walsh Transform Algorithm and Its Implementation with CUDA on GPUs
    Bikov, Dusan
    Bouyukliev, Iliya
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (05) : 21 - 43
  • [36] A High-Quality and Fast Maximal Independent Set Implementation for GPUs
    Burtscher, Martin
    Devale, Sindhu
    Azimi, Sahar
    Jaiganesh, Jayadharini
    Powers, Evan
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (02)
  • [37] Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining
    Yang, Xintian
    Parthasarathy, Srinivasan
    Sadayappan, P.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (04): : 231 - 242
  • [38] A GPU-parallel Algorithm for Fast Hybrid BFS-DFS Graph Traversal
    Maratea, Antonio
    Marcellino, Livia
    Duraccio, Vincenzo
    2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 450 - 457
  • [39] MASCOT: Fast and Highly Scalable SVM Cross-validation using GPUs and SSDs
    Wen, Zeyi
    Zhang, Rui
    Ramamohanarao, Kotagiri
    Qi, Jianzhong
    Taylor, Kerry
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 580 - 589
  • [40] Fast scalable and low-power quantum circuit simulation on the cluster of GPUs platforms
    Armin Ahmadzadeh
    Hamid Sarbazi-Azad
    Optical and Quantum Electronics, 56 (10)