An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

被引:4
|
作者
Ji, Hao [1 ]
Sosonkina, Masha [2 ]
Li, Yaohang [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Dept Modeling Simulat & Visualizat Engn, Norfolk, VA 23529 USA
关键词
Block Conjugate Gradient; Multi-core CPU; Graphics Processing Unit; Intel Xeon Phi; Performance Evaluation;
D O I
10.1109/Co-HPC.2014.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 50 条
  • [41] Parallel preconditioned conjugate gradient algorithm on GPU
    Helfenstein, Rudi
    Koko, Jonas
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 236 (15) : 3584 - 3590
  • [42] Conjugate Gradient Algorithm for Solving Linear Systems with Block-Fivediagonal Matrices on GPU
    Akimova, Elena N.
    Belinsky, Egor, V
    INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS (ICNAAM-2018), 2019, 2116
  • [43] A load balancing method in accelerating Kriging algorithm on CPU-GPU heterogeneous platforms
    Jiang, Chunlei
    Zhang, Shuqing
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2015, 37 (05): : 35 - 39
  • [44] Analysis of Energy Efficiency of a Parallel AES Algorithm for CPU-GPU Heterogeneous Platforms
    Fei, Xiongwei
    Li, Kenli
    Yang, Wangdong
    Li, Keqin
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 499 - 508
  • [45] gem5-gpu: A Heterogeneous CPU-GPU Simulator
    Power, Jason
    Hestness, Joel
    Orr, Marc S.
    Hill, Mark D.
    Wood, David A.
    IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) : 34 - 36
  • [46] Parallel Implementation of Conjugate Gradient Method on Graphics Processors
    Wozniak, Marcin
    Olas, Tomasz
    Wyrzykowski, Roman
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2010, 6067 : 125 - 135
  • [47] Automatic CPU-GPU Communication Management and Optimization
    Jablin, Thomas B.
    Prabhu, Prakash
    Jablin, James A.
    Johnson, Nick P.
    Beard, Stephen R.
    August, David I.
    ACM SIGPLAN NOTICES, 2011, 46 (06) : 142 - 151
  • [48] Workload Placement on Heterogeneous CPU-GPU Systems
    Carvalho, Marcos N. L.
    Simitsis, Alkis
    Queralt, Anna
    Romero, Oscar
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4241 - 4244
  • [49] An adaptive algorithm for high-dimensional integrals on heterogeneous CPU-GPU systems
    Laccetti, Giuliano
    Lapegna, Marco
    Mele, Valeria
    Montella, Raffaele
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (19):
  • [50] Revisiting Linpack Algorithm on Large-scale CPU-GPU Heterogeneous Systems
    Shui, Chaoyang
    Yu, Xianzhi
    Yan, Yujin
    Wang, Yinshan
    Meng, Ke
    Tan, Guangming
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 411 - 412