High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

被引:34
|
作者
Cevahir, Ali [1 ]
Nukada, Akira [3 ]
Matsuoka, Satoshi [2 ,4 ]
机构
[1] Tokyo Inst Technol, Dept Math & Comp Sci, Meguro Ku, Ookayama 2-12-1, Tokyo 1528552, Japan
[2] Tokyo Inst Technol, Natl Inst Informat, JST CREST, Chiyoda Ku, Tokyo 1018430, Japan
[3] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
[4] Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
来源
基金
日本科学技术振兴机构;
关键词
GPU computing; GPU cluster; Conjugate Gradients; Hypergraph partitioning;
D O I
10.1007/s00450-010-0112-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most timeconsuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPUextended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPUextended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.
引用
收藏
页码:83 / 91
页数:9
相关论文
共 50 条
  • [21] Algorithmic skeletons for multi-core, multi-GPU systems and clusters
    Ernsting, Steffen
    Kuchen, Herbert
    International Journal of High Performance Computing and Networking, 2012, 7 (02) : 129 - 138
  • [22] High Performance Single and Multi-GPU Acceleration for Diffuse Optical Tomography
    Saikia, Manob Jyoti
    Kanhirodan, Rajan
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1320 - 1323
  • [23] Efficient SDS Simulations on Multi-GPU Nodes of XSEDE High-end Clusters
    Schlachter, Samuel
    Herbein, Stephen
    Taufer, Michela
    Ou, Shuching
    Patel, Sandeep
    Logan, Jeremy S.
    2013 IEEE 9TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2013, : 116 - 123
  • [24] Accelerating neural network architecture search using multi-GPU high-performance computing
    Lupion, Marcos
    Cruz, N. C.
    Sanjuan, Juan F.
    Paechter, B.
    Ortigosa, Pilar M.
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (07): : 7609 - 7625
  • [25] Accelerating neural network architecture search using multi-GPU high-performance computing
    Marcos Lupión
    N. C. Cruz
    Juan F. Sanjuan
    B. Paechter
    Pilar M. Ortigosa
    The Journal of Supercomputing, 2023, 79 : 7609 - 7625
  • [26] A Comparative Study of Preconditioners for GPU-Accelerated Conjugate Gradient Solver
    Chen, Yao
    Zhao, Yonghua
    Zhao, Wei
    Zhao, Lian
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 628 - 635
  • [27] Distributed Join Algorithms on Multi-GPU Clusters with GPUDirect RDMA
    Guo, Chengxin
    Chen, Hong
    Zhang, Feng
    Li, Cuiping
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [28] Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 191 - 199
  • [29] Efficient implementation of data flow graphs on multi-gpu clusters
    Vincent Boulos
    Sylvain Huet
    Vincent Fristot
    Luc Salvo
    Dominique Houzet
    Journal of Real-Time Image Processing, 2014, 9 : 217 - 232
  • [30] Efficient implementation of data flow graphs on multi-gpu clusters
    Boulos, Vincent
    Huet, Sylvain
    Fristot, Vincent
    Salvo, Luc
    Houzet, Dominique
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2014, 9 (01) : 217 - 232