FSGraph: fast and scalable implementation of graph traversal on GPUs

被引:1
|
作者
Zhang, Yuan [1 ,2 ]
Cao, Huawei [1 ,3 ]
Liang, Yan [1 ,2 ]
Zhang, Jie [1 ,2 ]
Huang, Junying [1 ]
Ye, Xiaochun [1 ]
An, Xuejun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Univ Chinese Acad Sci, Nanjing 211135, Peoples R China
基金
北京市自然科学基金;
关键词
BFS; GPU-friendly CSR structure; Bidirectional 1d partition; UM-aware communication; ALGORITHMS;
D O I
10.1007/s42514-023-00155-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph is one of the best ways to express and process association relationship. It is widely used in various applications, including social networks, fraud detection, Internet of things, etc. As a typical graph traversal algorithm, the Breadth-First Search (BFS) performance on GPU is not desirable, due to strong data dependency, intensive irregular memory access and low computation intensity. On GPUs, the situation is even worse with unbalanced data partitioning and high communicationto-computation ratios. In this paper, we implement FSGraph that is a fast and scalable BFS implementation on GPUs. In FSGraph, we propose three optimizing techniques: GPU-friendly Compressed Sparse Row (CSR) structure, bidirectional one-dimensional (1d) partition and UM-aware communication. We have evaluated our work with extensive experiments on four T4 and four V100 GPUs. The average performance of BFS on four T4 GPUs is 132.67 Giga-Traversed Edges per Second (GTEPS), which delivers up to 1.44x improvement than that on single T4. In terms of four V100 GPUs, the BFS performance achieves 392.35 GTEPS and outperforms existing CPU-based cluster with 1024 nodes on November 2022 Graph500 list.
引用
收藏
页码:277 / 291
页数:15
相关论文
共 50 条
  • [1] FSGraph: fast and scalable implementation of graph traversal on GPUs
    Yuan Zhang
    Huawei Cao
    Yan Liang
    Jie Zhang
    Junying Huang
    Xiaochun Ye
    Xuejun An
    CCF Transactions on High Performance Computing, 2023, 5 : 277 - 291
  • [2] Scalable GPU Graph Traversal
    Merrill, Duane
    Garland, Michael
    Grimshaw, Andrew
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 117 - 127
  • [3] Self-adaptive Graph Traversal on GPUs
    Sha, Mo
    Li, Yuchen
    Tan, Kian-Lee
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1558 - 1570
  • [4] Scalable Graph Sampling on GPUs with Compressed Graph
    Yin, Hongbo
    Shao, Yingxia
    Miao, Xupeng
    Li, Yawen
    Cui, Bin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2383 - 2392
  • [5] Enterprise: Breadth-First Graph Traversal on GPUs
    Liu, Hang
    Huang, H. Howie
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [6] GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs
    Kim, Min-Soo
    An, Kyuhyeon
    Park, Himchan
    Seo, Hyunseok
    Kim, Jinwook
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 447 - 461
  • [7] Efficient Load Balancing Techniques for Graph Traversal Applications on GPUs
    Busato, Federico
    Bombieri, Nicola
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 628 - 641
  • [8] SRelation: Fast RDF Graph Traversal
    Mojzis, Jan
    Laclavik, Michal
    KNOWLEDGE ENGINEERING AND THE SEMANTIC WEB (KESW 2013), 2013, 394 : 69 - 82
  • [9] Scalable and Fast Lazy Persistency on GPUs
    Yudha, Ardhi Wiratama Baskara
    Kimura, Keiji
    Zhou, Huiyang
    Solihin, Yan
    2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 252 - 263
  • [10] Efficient and Scalable Graph Pattern Mining on GPUs
    Chen, Xuhao
    Arvind
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 857 - 877