Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

被引:4
|
作者
Ji, Haonan [1 ]
Lu, Shibo [1 ]
Hou, Kaixi [2 ]
Wang, Hao [3 ]
Jin, Zhou [1 ]
Liu, Weifeng [1 ]
Vinter, Brian [4 ]
机构
[1] China Univ Petr, Dept Comp Sci & Technol, Super Sci Software Lab, Beijing, Peoples R China
[2] Virginia Tech, Dept Comp Sci, Blacksburg, VA USA
[3] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[4] Aarhus Univ, Fac Tech Sci, Aarhus C, Denmark
基金
中国国家自然科学基金;
关键词
Parallel computing; Segmented merge; Sparse matrix; GPU; MANY-CORE; MULTIPLICATION;
D O I
10.1007/s10766-021-00695-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix-matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94x (up to 13.09x) and 2.89x (up to 109.15x) speedup on SpTRANS and SpGEMM, respectively.
引用
收藏
页码:732 / 744
页数:13
相关论文
共 50 条
  • [1] Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations
    Haonan Ji
    Shibo Lu
    Kaixi Hou
    Hao Wang
    Zhou Jin
    Weifeng Liu
    Brian Vinter
    International Journal of Parallel Programming, 2021, 49 : 732 - 744
  • [2] SPARSE-MATRIX COMPUTATIONS ON PARALLEL PROCESSOR ARRAYS
    OGIELSKI, AT
    AIELLO, W
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1993, 14 (03): : 519 - 530
  • [3] Sparse matrix computations on bulk synchronous parallel computers
    Bisseling, RH
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1996, 76 : 127 - 130
  • [4] Merge-based Parallel Sparse Matrix-Vector Multiplication
    Merrill, Duane
    Garland, Michael
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
  • [5] A sparse parallel hybrid Monte Carlo algorithm for matrix computations
    Branford, S
    Weihrauch, C
    Alexandrov, V
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 743 - 751
  • [6] Parallel sparse matrix computations in the industrial strength PINEAPL library
    Krommer, AR
    APPLIED PARALLEL COMPUTING: LARGE SCALE SCIENTIFIC AND INDUSTRIAL PROBLEMS, 1998, 1541 : 281 - 285
  • [7] Merge-based Parallel Sparse Matrix-Sparse Vector Multiplication with a Vector Architecture
    Li, Haoran
    Yokoyama, Harumichi
    Araki, Takuya
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 43 - 50
  • [8] Ordering unstructured meshes for sparse matrix computations on leading parallel systems
    Oliker, L
    Li, XY
    Heber, G
    Biswas, R
    PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 2000, 1800 : 497 - 503
  • [9] Multi-pass mapping schemes for parallel sparse matrix computations
    Malkowski, K
    Raghavan, P
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 245 - 255
  • [10] Parallel sparse matrix computations using the PINEAPL Library: A performance study
    Krommer, AR
    EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 804 - 811