tpSpMV: A two-phase large-scale sparse matrix-vector multiplication kernel for manycore architectures

被引:10
|
作者
Chen, Yuedan [1 ,2 ]
Xiao, Guoqing [1 ,2 ]
Wu, Fan [1 ,2 ]
Tang, Zhuo [1 ,2 ]
Li, Keqin [1 ,2 ,3 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Hunan, Peoples R China
[2] Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China
[3] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国国家自然科学基金;
关键词
CSR; Manycore; Parallelization; Sparse matrix-vector multiplication (SpMV); SW26010; SPMV; OPTIMIZATION; SYSTEMS;
D O I
10.1016/j.ins.2020.03.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-vector multiplication (SpMV) is one of the important subroutines in numerical linear algebras widely used in lots of large-scale applications. Accelerating SpMV on multicore and manycore architectures based on Compressed Sparse Row (CSR) format via row-wise parallelization is one of the most popular directions. However, there are three main challenges in optimizing parallel CSR-based SpMV: (a) limited local memory of each computing unit can be overwhelmed by assignments to long rows of large-scale sparse matrices; (b) irregular accesses to the input vector result in expensive memory access latency; (c) sparse data structure leads to low bandwidth usage. This paper proposes a two-phase large-scale SpMV, called tpSpMV, based on the memory structure and computing architecture of multicore and manycore architectures to alleviate the three main difficulties. First, we propose the two-phase parallel execution technique for tpSpMV that performs parallel CSR-based SpMV into two separate phases to overcome the computational scale limitation. Second, we respectively propose the adaptive partitioning methods and parallelization designs using the local memory caching technique for the two phases to exploit the architectural advantages of the high-performance computing platforms and alleviate the problem of high memory access latency. Third, we design several optimizations, such as data reduction, aligned memory accessing, and pipeline technique, to improve bandwidth usage and optimize tpSpMV's performance. Experimental results on SW26010 CPUs of the Sunway TaihuLight supercomputer prove that tpSpMV achieves up to 28.61 speedups and yields the performance improvement of 13.16% over the state-of-the-art work on average. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:279 / 295
页数:17
相关论文
共 50 条
  • [31] A two-phase sampling strategy for large-scale forest carbon budgets
    Fattorini, L.
    Franceschi, S.
    Pisani, C.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (03) : 1045 - 1055
  • [32] Large-scale properties for two-phase flow in random porous media
    Ahmadi, A
    Quintard, M
    JOURNAL OF HYDROLOGY, 1996, 183 (1-2) : 69 - 99
  • [33] EXPERIMENTAL AND MODELLING ANALYSIS OF A LARGE-SCALE TWO-PHASE LOOP THERMOSYPHON
    Aragones, Debraliz Isaac
    Chen, Chien-Hua
    Weibel, Justin A.
    Warsinger, David M.
    Bonner, Richard W.
    PROCEEDINGS OF ASME 2022 HEAT TRANSFER SUMMER CONFERENCE, HT2022, 2022,
  • [34] A large-scale integrated vector-matrix multiplication processor based on monolayer molybdenum disulfide memories
    Marega, Guilherme Migliato
    Ji, Hyun Goo
    Wang, Zhenyu
    Pasquale, Gabriele
    Tripathi, Mukesh
    Radenovic, Aleksandra
    Kis, Andras
    NATURE ELECTRONICS, 2023, 6 (12) : 991 - 998
  • [35] A two-phase approach to interactivity enhancement for large-scale distributed virtual environments
    Ta, Duong Nguyen Binh
    Zhou, Suiping
    COMPUTER NETWORKS, 2007, 51 (14) : 4131 - 4152
  • [36] A Two-Phase Learning-Based Swarm Optimizer for Large-Scale Optimization
    Lan, Rushi
    Zhu, Yu
    Lu, Huimin
    Liu, Zhenbing
    Luo, Xiaonan
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6284 - 6293
  • [37] Analysis of large-scale averaged models for two-phase flow in fractured reservoirs
    Chen, ZX
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1998, 223 (01) : 158 - 181
  • [38] Experimental Study of Two-Phase Cooling to Enable Large-Scale System Computing Performance
    Kulkarni, Devdatta
    Tang, Xudong
    Ahuja, Sandeep
    Dischler, Richard
    Mahajan, Ravi
    PROCEEDINGS OF THE 17TH IEEE INTERSOCIETY CONFERENCE ON THERMAL AND THERMOMECHANICAL PHENOMENA IN ELECTRONIC SYSTEMS (ITHERM 2018), 2018, : 596 - 601
  • [39] Periodic large-scale structural characteristics of two-phase flow in tight lattice bundles
    Yan, Xu
    Xiao, Yao
    Zhang, Hengwei
    Gu, Hanyang
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 213
  • [40] Application of DSMC method in large-scale gas-solid two-phase impinging streams
    Du, Min
    Hao, Yingli
    Liu, Xiangdong
    Huagong Xuebao/CIESC Journal, 2009, 60 (08): : 1950 - 1958