Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes

被引：0

作者：

Sancho, Jose Carlos ^{[1
]}

Kerbyson, Darren. J. ^{[1
]}

机构：

[1] Los Alamos Natl Lab, PAL, Los Alamos, NM 87545 USA

来源：

EURO-PAR 2009: PARALLEL PROCESSING, PROCEEDINGS | 2009年 / 5704卷

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Hybrid architectures that combine general purpose processors with accelerators are currently being adopted in several large-scale systems such as the Petaflop Roadrunner supercomputer at Los Alamos. In this system, dual-core Opteron host; processors are tightly coupled with PowerXCell 8i accelerator processors within each compute node. In this kind of hybrid architecture; an accelerated mode of operation is typically used to off-load performance hotspots in the computation to the accelerators. In this paper we explore the suitability of a variant of this acceleration mode in which the performance hotspots are actually shared between the host and the accelerators. To achieve tilts we have designed a new load balancing algorithm, which is optimized for the Roadrunner compute nodes, to dynamically distribute computation and associated data between the host and the accelerators at runtime. Results are presented using this approach, for sparse and dense matrix-vector multiplications, that show load-balancing can improve performance by up to 24% over solely using the accelerators.

引用

页码：166 / 177

页数：12

共 49 条

[1] Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications
Ichimura, Shuntaro
Ogita, Takeshi
Katagiri, Takahiro
Nagai, Toru
Ozaki, Katsuhisa
[J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 1093 - 1102
[2] Load-balancing in sparse matrix-vector multiplication
Nastea, SG
Frieder, O
ElGhazawi, T
[J]. EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
[3] Designing Incoherent Frames With Only Matrix-Vector Multiplications
Dumitrescu, Bogdan
[J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) : 1265 - 1269
[4] BUTTERFLY FACTORIZATION VIA RANDOMIZED MATRIX-VECTOR MULTIPLICATIONS
Liu, Yang
Xing, Xin
Guo, Han
Michielssen, Eric
Ghysels, Pieter
Li, Xiaoye Sherry
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2021, 43 (02): : A883 - A907
[5] Efficient Fault Tolerant Parallel Matrix-Vector Multiplications
Gao, Zhen
Reviriego, Pedro
Maestr, Juan Antonio
[J]. 2016 IEEE 22ND INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS), 2016, : 25 - 26
[6] A Memory Transaction Model for Sparse Matrix-Vector Multiplications on GPUs
Keklikian, Thalie
Langlois, J. M. Pierre
Savaria, Yvon
[J]. 2014 IEEE 12TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2014, : 309 - 312
[7] HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS
Fang, Fuhui
Huang, Jingfang
Huber, Gary
McCammon, J. Andrew
Zhang, Bo
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2018, 40 (03): : A1345 - A1361
[8] Memory-aware Optimization for Sequences of Sparse Matrix-Vector Multiplications
Zhang, Yichen
Li, Shengguo
Yuan, Fan
Dong, Dezun
Yang, Xiaojian
Li, Tiejun
Wang, Zheng
[J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 379 - 389
[9] Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
Mallick, Ankur
Chaudhari, Malhar
Sheth, Utsav
Palanikumar, Ganesh
Joshi, Gauri
[J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2019, 3 (03)
[10] Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication
Mallick, Ankur
Chaudhari, Malhar
Sheth, Utsav
Palanikumar, Ganesh
Joshi, Gauri
[J]. COMMUNICATIONS OF THE ACM, 2022, 65 (05) : 111 - 118

← 1 2 3 4 5 →