Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks

被引：11

作者：

Lee, Jeongmyung ^{[1
]}

Kang, Seokwon ^{[1
]}

Yu, Yongseung ^{[1
]}

Jo, Yong-Yeon ^{[1
]}

Kim, Sang-Wook ^{[1
]}

Park, Yongjun ^{[1
]}

机构：

[1] Hanyang Univ, Dept Comp Sci, Seoul, South Korea

来源：

2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020) | 2020年

关键词：

Sparse matrix multiplication; sparse network; GPU; linear algebra; EFFICIENT;

D O I：

10.1109/ICDE48307.2020.00085

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuSPARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks. To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.

引用

页码：925 / 936

页数：12

共 50 条

[31] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
Zhao, Zhixiang
Wu, Yanxia
Zhang, Guoyin
Yang, Yiqing
Hong, Ruize
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5):
[32] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Monakov, Alexander
Lokhmotov, Anton
Avetisyan, Arutyun
HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +
[33] GPU-based Multifrontal Optimizing Method in Sparse Cholesky Factorization
Zheng, Ran
Wang, Wei
Jin, Hai
Wu, Song
Chen, Yong
Jiang, Han
PROCEEDINGS OF THE ASAP2015 2015 IEEE 26TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2015, : 90 - 97
[34] Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
刘力
LiuLi
Yang Guang wen
HighTechnologyLetters, 2013, 19 (04) : 339 - 345
[35] GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING
Gremse, Felix
Hoefter, Andreas
Schwen, Lars Ole
Kiessling, Fabian
Naumann, Uwe
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C54 - C71
[36] spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
Parger, Mathias
Winter, Martin
Mlakar, Daniel
Steinberger, Markus
PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 362 - 375
[37] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
Deveci, Mehmet
Trott, Christian
Rajamanickam, Sivasankaran
PARALLEL COMPUTING, 2018, 78 : 33 - 46
[38] Sparse Matrix Sparse Vector Multiplication - A Novel Approach
Shah, Monika
2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 67 - 73
[39] A New Segmentation-Based GPU-Accelerated Sparse Matrix-Vector Multiplication
He, Kai
Tan, Sheldon X-D
Tlelo-Cuautle, Esteban
Wang, Hai
Tang, He
2014 IEEE 57TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2014, : 1013 - 1016
[40] Coded Sparse Matrix Multiplication
Wang, Sinong
Liu, Jiashang
Shroff, Ness
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80

← 1 2 3 4 5 →