Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes

被引:0
|
作者
Sancho, Jose Carlos [1 ]
Kerbyson, Darren. J. [1 ]
机构
[1] Los Alamos Natl Lab, PAL, Los Alamos, NM 87545 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hybrid architectures that combine general purpose processors with accelerators are currently being adopted in several large-scale systems such as the Petaflop Roadrunner supercomputer at Los Alamos. In this system, dual-core Opteron host; processors are tightly coupled with PowerXCell 8i accelerator processors within each compute node. In this kind of hybrid architecture; an accelerated mode of operation is typically used to off-load performance hotspots in the computation to the accelerators. In this paper we explore the suitability of a variant of this acceleration mode in which the performance hotspots are actually shared between the host and the accelerators. To achieve tilts we have designed a new load balancing algorithm, which is optimized for the Roadrunner compute nodes, to dynamically distribute computation and associated data between the host and the accelerators at runtime. Results are presented using this approach, for sparse and dense matrix-vector multiplications, that show load-balancing can improve performance by up to 24% over solely using the accelerators.
引用
收藏
页码:166 / 177
页数:12
相关论文
共 49 条
  • [1] Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications
    Ichimura, Shuntaro
    Ogita, Takeshi
    Katagiri, Takahiro
    Nagai, Toru
    Ozaki, Katsuhisa
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 1093 - 1102
  • [2] Load-balancing in sparse matrix-vector multiplication
    Nastea, SG
    Frieder, O
    ElGhazawi, T
    [J]. EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
  • [3] Designing Incoherent Frames With Only Matrix-Vector Multiplications
    Dumitrescu, Bogdan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) : 1265 - 1269
  • [4] BUTTERFLY FACTORIZATION VIA RANDOMIZED MATRIX-VECTOR MULTIPLICATIONS
    Liu, Yang
    Xing, Xin
    Guo, Han
    Michielssen, Eric
    Ghysels, Pieter
    Li, Xiaoye Sherry
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (02): : A883 - A907
  • [5] Efficient Fault Tolerant Parallel Matrix-Vector Multiplications
    Gao, Zhen
    Reviriego, Pedro
    Maestr, Juan Antonio
    [J]. 2016 IEEE 22ND INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS), 2016, : 25 - 26
  • [6] A Memory Transaction Model for Sparse Matrix-Vector Multiplications on GPUs
    Keklikian, Thalie
    Langlois, J. M. Pierre
    Savaria, Yvon
    [J]. 2014 IEEE 12TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2014, : 309 - 312
  • [7] HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS
    Fang, Fuhui
    Huang, Jingfang
    Huber, Gary
    McCammon, J. Andrew
    Zhang, Bo
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2018, 40 (03): : A1345 - A1361
  • [8] Memory-aware Optimization for Sequences of Sparse Matrix-Vector Multiplications
    Zhang, Yichen
    Li, Shengguo
    Yuan, Fan
    Dong, Dezun
    Yang, Xiaojian
    Li, Tiejun
    Wang, Zheng
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 379 - 389
  • [9] Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
    Mallick, Ankur
    Chaudhari, Malhar
    Sheth, Utsav
    Palanikumar, Ganesh
    Joshi, Gauri
    [J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2019, 3 (03)
  • [10] Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
    Mallick, Ankur
    Chaudhari, Malhar
    Sheth, Utsav
    Palanikumar, Ganesh
    Joshi, Gauri
    [J]. COMMUNICATIONS OF THE ACM, 2022, 65 (05) : 111 - 118