High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

被引：34

作者：

Cevahir, Ali ^{[1
]}

Nukada, Akira ^{[3
]}

Matsuoka, Satoshi ^{[2
,4
]}

机构：

[1] Tokyo Inst Technol, Dept Math & Comp Sci, Meguro Ku, Ookayama 2-12-1, Tokyo 1528552, Japan

[2] Tokyo Inst Technol, Natl Inst Informat, JST CREST, Chiyoda Ku, Tokyo 1018430, Japan

[3] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan

[4] Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan

来源：

COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT | 2010年 / 25卷 / 1-2期

基金：

日本科学技术振兴机构;

关键词：

GPU computing; GPU cluster; Conjugate Gradients; Hypergraph partitioning;

D O I：

10.1007/s00450-010-0112-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most timeconsuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPUextended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPUextended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.

引用

页码：83 / 91

页数：9

共 50 条

[41] Two-stage Asynchronous Iterative Solvers for multi-GPU Clusters
Nayak, Pratik
Cojean, Terry
Anzt, Hartwig
PROCEEDINGS OF SCALA 2020: 11TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, 2020, : 9 - 18
[42] High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
Abdelfattah, Ahmad
Ltaief, Hatem
Keyes, David
EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 601 - 612
[43] MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization
Sun, Yifan
Baruah, Trinayan
Mojumder, Saiful A.
Dong, Shi
Gong, Xiang
Treadway, Shane
Bao, Yuhui
Hance, Spencer
McCardwell, Carter
Zhao, Vincent
Barclay, Harrison
Ziabari, Amir Kavyan
Chen, Zhongliang
Ubal, Rafael
Abelian, Jose L.
Kim, John
Joshi, Ajay
Kaeli, David
PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 197 - 209
[44] Multi-GPU performance optimization of a computational fluid dynamics code using OpenACC
Xue, Weicheng
Roy, Christoper J.
Concurrency and Computation: Practice and Experience, 2021, 33 (05)
[45] Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams
Guo, Ping
Zhang, Changjiang
2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 67 - 72
[46] Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons
Fabian Wrede
Herbert Kuchen
International Journal of Parallel Programming, 2020, 48 : 713 - 728
[47] Solving incompressible two-phase flows on multi-GPU clusters
Zaspel, Peter
Griebel, Michael
COMPUTERS & FLUIDS, 2013, 80 : 356 - 364
[48] New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters
Dominguez, J. M.
Crespo, A. J. C.
Valdez-Balderas, D.
Rogers, B. D.
Gomez-Gesteira, M.
COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (08) : 1848 - 1860
[49] Solving incompressible two-phase flows on multi-GPU clusters
Zaspel, P. (zaspel@ins.uni-bonn.de), 1600, Elsevier Ltd (80):
[50] Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
Jo, Gangwon
Nah, Jeongho
Lee, Jun
Kim, Jungwon
Lee, Jaejin
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (07) : 1814 - 1825

← 1 2 3 4 5 →