An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

被引:4
|
作者
Ji, Hao [1 ]
Sosonkina, Masha [2 ]
Li, Yaohang [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Dept Modeling Simulat & Visualizat Engn, Norfolk, VA 23529 USA
关键词
Block Conjugate Gradient; Multi-core CPU; Graphics Processing Unit; Intel Xeon Phi; Performance Evaluation;
D O I
10.1109/Co-HPC.2014.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 50 条
  • [21] Runtime power allocation approach for GAMESS hybrid CPU-GPU implementation
    Sundriyal, Vaibhav
    Sosonkina, Masha
    Poole, David
    Gordon, Mark S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (24):
  • [22] A Heterogeneous CPU-GPU Implementation for Discrete Elements Simulation with Multiple GPUs
    Tian, Yuan
    Qi, Ji
    Lai, Junjie
    Zhou, Qingguo
    Yang, Lei
    2013 INTERNATIONAL JOINT CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY & UBI-MEDIA COMPUTING (ICAST-UMEDIA), 2013, : 547 - +
  • [23] Hybrid CPU-GPU implementation of the transformed spatial domain channel estimation algorithm for mmWave MIMO systems
    Lloria, Diego
    Aviles, Pablo M.
    Belloch, Jose A.
    Roger, Sandra
    Botella-Mascarell, Carmen
    Lindoso, Almudena
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9371 - 9382
  • [24] Efficient Implementation of Quantum Materials Simulations on Distributed CPU-GPU Systems
    Solca, Raffaele
    Kozhevnikov, Anton
    Haidar, Azzam
    Tomov, Stanimire
    Dongarra, Jack
    Schulthess, Thomas C.
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [25] REDEFINING THE ROLE OF THE CPU IN THE ERA OF CPU-GPU INTEGRATION
    Arora, Manish
    Nath, Siddhartha
    Mazumdar, Subhra
    Baden, Scott B.
    Tullsen, Dean M.
    IEEE MICRO, 2012, 32 (06) : 4 - 16
  • [26] Optimized Real-Time MUSIC Algorithm With CPU-GPU Architecture
    Huang, Qinghua
    Lu, Naida
    IEEE ACCESS, 2021, 9 : 54067 - 54077
  • [27] Optimization of Parallel Algorithm for Kalman Filter on CPU-GPU Heterogeneous System
    Xu, Dandan
    Xiao, Zheng
    Li, Dapu
    Wu, Fan
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 2165 - 2172
  • [28] A CPU-GPU Cooperative Sorting Approach
    Raju, K.
    Chiplunkar, Niranjan N.
    Rajanikanth, Kavoor
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [29] An industrial defect detection algorithm based on CPU-GPU parallel call
    Li, Zhu
    Lin, Hong-wei
    Liu, Yuan-yuan
    Chen, Chong
    Xia, Yun-fei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (28) : 44191 - 44207
  • [30] Development of a CPU-GPU heterogeneous platform based on a nonlinear parallel algorithm
    Ma, Haifeng
    NONLINEAR ENGINEERING - MODELING AND APPLICATION, 2022, 11 (01): : 215 - 222