tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures

被引:10
|
作者
Chen, Yuedan [1 ,2 ]
Xiao, Guoqing [1 ,2 ]
Wu, Fan [1 ,2 ]
Tang, Zhuo [1 ,2 ]
Li, Keqin [1 ,2 ,3 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China
[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国国家自然科学基金;
关键词
CSR; Manycore; Parallelization; Sparse matrix-vector multiplication (SpMV); SW26010; SPMV; OPTIMIZATION; SYSTEMS;
D O I
10.1016/j.ins.2020.03.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-vector multiplication (SpMV) is one of the important subroutines in numerical linear algebras widely used in lots of large-scale applications. Accelerating SpMV on multicore and manycore architectures based on Compressed Sparse Row (CSR) format via row-wise parallelization is one of the most popular directions. However, there are three main challenges in optimizing parallel CSR-based SpMV: (a) limited local memory of each computing unit can be overwhelmed by assignments to long rows of large-scale sparse matrices; (b) irregular accesses to the input vector result in expensive memory access latency; (c) sparse data structure leads to low bandwidth usage. This paper proposes a two-phase large-scale SpMV, called tpSpMV, based on the memory structure and computing architecture of multicore and manycore architectures to alleviate the three main difficulties. First, we propose the two-phase parallel execution technique for tpSpMV that performs parallel CSR-based SpMV into two separate phases to overcome the computational scale limitation. Second, we respectively propose the adaptive partitioning methods and parallelization designs using the local memory caching technique for the two phases to exploit the architectural advantages of the high-performance computing platforms and alleviate the problem of high memory access latency. Third, we design several optimizations, such as data reduction, aligned memory accessing, and pipeline technique, to improve bandwidth usage and optimize tpSpMV's performance. Experimental results on SW26010 CPUs of the Sunway TaihuLight supercomputer prove that tpSpMV achieves up to 28.61 speedups and yields the performance improvement of 13.16% over the state-of-the-art work on average. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:279 / 295
页数:17
相关论文
共 50 条
  • [1] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
    Gao, Jianhua
    Ji, Weixing
    Wang, Yizhuo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
  • [2] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
    Georgios Goumas
    Kornilios Kourtis
    Nikos Anastopoulos
    Vasileios Karakasis
    Nectarios Koziris
    The Journal of Supercomputing, 2009, 50 : 36 - 77
  • [3] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    JOURNAL OF SUPERCOMPUTING, 2009, 50 (01): : 36 - 77
  • [4] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
    Monakov, Alexander
    Lokhmotov, Anton
    Avetisyan, Arutyun
    HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +
  • [5] Scale-Free Sparse Matrix-Vector Multiplication on Many-Core Architectures
    Liang, Yun
    Tang, Wai Teng
    Zhao, Ruizhe
    Lu, Mian
    Huynh Phung Huynh
    Goh, Rick Siow Mong
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2017, 36 (12) : 2106 - 2119
  • [6] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
    Dehn, T
    Eiermann, M
    Giebermann, K
    Sperling, V
    PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894
  • [7] Giga-scale Kernel Matrix-Vector Multiplication on GPU
    Hu, Robert
    Chau, Siu Lun
    Sejdinovic, Dino
    Glaunes, Joan Alexis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Conflict-Free Symmetric Sparse Matrix-Vector Multiplication on Multicore Architectures
    Elafrou, Athena
    Goumas, Georgios
    Koziris, Nectarios
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [9] A Comprehensive Performance Model of Sparse Matrix-Vector Multiplication to Guide Kernel Optimization
    Xia, Tian
    Fu, Gelin
    Li, Chenyang
    Luo, Zhongpei
    Zhang, Lucheng
    Chen, Ruiyang
    Zhao, Wenzhe
    Zheng, Nanning
    Ren, Pengju
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) : 519 - 534
  • [10] Complex-valued matrix-vector multiplication system for a large-scale optical FFT
    Cao, Ziyu
    Zhang, Wenkai
    Zhou, Hailong
    Dong, Jianji
    Zhang, Xinliang
    OPTICS LETTERS, 2023, 48 (22) : 5871 - 5874