tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures

被引：10

作者：

Chen, Yuedan ^{[1
,2
]}

Xiao, Guoqing ^{[1
,2
]}

Wu, Fan ^{[1
,2
]}

Tang, Zhuo ^{[1
,2
]}

Li, Keqin ^{[1
,2
,3
]}

机构：

[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China

[2] Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China

[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

来源：

INFORMATION SCIENCES | 2020年 / 523卷

基金：

中国国家自然科学基金;

关键词：

CSR; Manycore; Parallelization; Sparse matrix-vector multiplication (SpMV); SW26010; SPMV; OPTIMIZATION; SYSTEMS;

D O I：

10.1016/j.ins.2020.03.020

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix-vector multiplication (SpMV) is one of the important subroutines in numerical linear algebras widely used in lots of large-scale applications. Accelerating SpMV on multicore and manycore architectures based on Compressed Sparse Row (CSR) format via row-wise parallelization is one of the most popular directions. However, there are three main challenges in optimizing parallel CSR-based SpMV: (a) limited local memory of each computing unit can be overwhelmed by assignments to long rows of large-scale sparse matrices; (b) irregular accesses to the input vector result in expensive memory access latency; (c) sparse data structure leads to low bandwidth usage. This paper proposes a two-phase large-scale SpMV, called tpSpMV, based on the memory structure and computing architecture of multicore and manycore architectures to alleviate the three main difficulties. First, we propose the two-phase parallel execution technique for tpSpMV that performs parallel CSR-based SpMV into two separate phases to overcome the computational scale limitation. Second, we respectively propose the adaptive partitioning methods and parallelization designs using the local memory caching technique for the two phases to exploit the architectural advantages of the high-performance computing platforms and alleviate the problem of high memory access latency. Third, we design several optimizations, such as data reduction, aligned memory accessing, and pipeline technique, to improve bandwidth usage and optimize tpSpMV's performance. Experimental results on SW26010 CPUs of the Sunway TaihuLight supercomputer prove that tpSpMV achieves up to 28.61 speedups and yields the performance improvement of 13.16% over the state-of-the-art work on average. (C) 2020 Elsevier Inc. All rights reserved.

引用

页码：279 / 295

页数：17

共 50 条

[1] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
Gao, Jianhua
Ji, Weixing
Wang, Yizhuo
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
[2] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
Georgios Goumas
Kornilios Kourtis
Nikos Anastopoulos
Vasileios Karakasis
Nectarios Koziris
The Journal of Supercomputing, 2009, 50 : 36 - 77
[3] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
Goumas, Georgios
Kourtis, Kornilios
Anastopoulos, Nikos
Karakasis, Vasileios
Koziris, Nectarios
JOURNAL OF SUPERCOMPUTING, 2009, 50 (01): : 36 - 77
[4] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Monakov, Alexander
Lokhmotov, Anton
Avetisyan, Arutyun
HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +
[5] Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures
Liang, Yun
Tang, Wai Teng
Zhao, Ruizhe
Lu, Mian
Huynh Phung Huynh
Goh, Rick Siow Mong
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2017, 36 (12) : 2106 - 2119
[6] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
Dehn, T
Eiermann, M
Giebermann, K
Sperling, V
PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894
[7] Giga-scale Kernel Matrix-Vector Multiplication on GPU
Hu, Robert
Chau, Siu Lun
Sejdinovic, Dino
Glaunes, Joan Alexis
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[8] Conflict-Free Symmetric Sparse Matrix-Vector Multiplication on Multicore Architectures
Elafrou, Athena
Goumas, Georgios
Koziris, Nectarios
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[9] A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization
Xia, Tian
Fu, Gelin
Li, Chenyang
Luo, Zhongpei
Zhang, Lucheng
Chen, Ruiyang
Zhao, Wenzhe
Zheng, Nanning
Ren, Pengju
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) : 519 - 534
[10] Complex-valued matrix-vector multiplication system for a large-scale optical FFT
Cao, Ziyu
Zhang, Wenkai
Zhou, Hailong
Dong, Jianji
Zhang, Xinliang
OPTICS LETTERS, 2023, 48 (22) : 5871 - 5874

← 1 2 3 4 5 →