An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

被引:4
|
作者
Ji, Hao [1 ]
Sosonkina, Masha [2 ]
Li, Yaohang [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Dept Modeling Simulat & Visualizat Engn, Norfolk, VA 23529 USA
关键词
Block Conjugate Gradient; Multi-core CPU; Graphics Processing Unit; Intel Xeon Phi; Performance Evaluation;
D O I
10.1109/Co-HPC.2014.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 50 条
  • [1] Algorithm for Cooperative CPU-GPU Computing
    Aciu, Razvan-Mihai
    Ciocarlie, Horia
    2013 15TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2013), 2014, : 352 - 358
  • [2] Parallel Implementation of Sieving Algorithm on Heterogeneous CPU-GPU Computing Architectures
    Wu, Mengsi
    Li, Pei
    Chen, Jiageng
    Yao, Shixiong
    INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2024, 2025, 15053 : 258 - 272
  • [3] Improving CPU Performance through Dynamic GPU Access Throttling in CPU-GPU Heterogeneous Processors
    Rai, Siddharth
    Chaudhuri, Mainak
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 18 - 29
  • [4] GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors
    Hestness, Joel
    Keckler, Stephen W.
    Wood, David A.
    2015 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2015, : 87 - 97
  • [5] An experimental study of group-by and aggregation on CPU-GPU processors
    Luan H.
    Chang L.
    Journal of Engineering and Applied Science, 2022, 69 (1):
  • [6] Toward a software transactional memory for heterogeneous CPU-GPU processors
    Villegas, Alejandro
    Navarro, Angeles
    Asenjo, Rafael
    Plata, Oscar
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (08): : 4177 - 4192
  • [7] Optimization of the HEFT algorithm for a CPU-GPU environment
    Shetti, Karan R.
    Fahmy, Suhaib A.
    Bretschneider, Timo
    2013 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2013, : 212 - 218
  • [8] Block Gauss-Huard algorithm with column pivoting on a hybrid CPU-GPU architecture
    Hassanein, Maha Amin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (04):
  • [9] Power-Aware Characterization and Mapping of Workloads on CPU-GPU Processors
    Dev, Kapil
    Zhan, Xin
    Reda, Sherief
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 225 - 226
  • [10] Offloading Accelerator-intensive Workloads in CPU-GPU Heterogeneous Processors
    Tsog, Nandinbaatar
    Mubeen, Saad
    Bruhn, Fredrik
    Behnam, Moris
    Sjodin, Mikael
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,