FSGraph: fast and scalable implementation of graph traversal on GPUs

被引:1
|
作者
Zhang, Yuan [1 ,2 ]
Cao, Huawei [1 ,3 ]
Liang, Yan [1 ,2 ]
Zhang, Jie [1 ,2 ]
Huang, Junying [1 ]
Ye, Xiaochun [1 ]
An, Xuejun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Nanjing 211135, Peoples R China
基金
北京市自然科学基金;
关键词
BFS; GPU-friendly CSR structure; Bidirectional 1d partition; UM-aware communication; ALGORITHMS;
D O I
10.1007/s42514-023-00155-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communicationto-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44x improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.
引用
收藏
页码:277 / 291
页数:15
相关论文
共 50 条
  • [21] Scalable Multi-node Fast Fourier Transform on GPUs
    Verma M.
    Chatterjee S.
    Garg G.
    Sharma B.
    Arya N.
    Kumar S.
    Saxena A.
    Verma M.K.
    SN Computer Science, 4 (5)
  • [22] Graph-theoretic Formulation of QUBO for Scalable Local Search on GPUs
    Yasudo, Ryota
    Nakano, Koji
    Ito, Yasuaki
    Kawamata, Yuya
    Katsuki, Ryota
    Ozaki, Shiro
    Yazane, Takashi
    Hamano, Kenichiro
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 425 - 434
  • [23] Graph traversal and graph transformation
    Holdsworth, JJ
    THEORETICAL COMPUTER SCIENCE, 2004, 321 (2-3) : 215 - 231
  • [24] TRAVERSAL: A Fast and Adaptive Graph-Based Placement and Routing for CGRAs
    Canesche, Michael
    Menezes, Marcelo
    Carvalho, Westerley
    Torres, Frank Sill
    Jamieson, Peter
    Nacif, Jose Augusto
    Ferreira, Ricardo
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (08) : 1600 - 1612
  • [25] Fast detection of community structures using graph traversal in social networks
    Basuchowdhuri, Partha
    Sikdar, Satyaki
    Nagarajan, Varsha
    Mishra, Khusbu
    Gupta, Surabhi
    Majumder, Subhashis
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 59 (01) : 1 - 31
  • [26] Fast detection of community structures using graph traversal in social networks
    Partha Basuchowdhuri
    Satyaki Sikdar
    Varsha Nagarajan
    Khusbu Mishra
    Surabhi Gupta
    Subhashis Majumder
    Knowledge and Information Systems, 2019, 59 : 1 - 31
  • [27] Fast Equi-Join Algorithms on GPUs: Design and Implementation
    Rui, Ran
    Tu, Yi-Cheng
    SSDBM 2017: 29TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2017,
  • [28] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [29] A Scalable Parallel Implementation of Evolutionary Algorithms for Multi-Objective Optimization on GPUs
    Gupta, Samarth
    Tan, Gary
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 1567 - 1574
  • [30] CPU-Style SIMD Ray Traversal on GPUs
    Lier, Alexander
    Stamminger, Marc
    Selgrad, Kai
    HIGH-PERFORMANCE GRAPHICS 2018, 2018,