An Implementation of Block Conjugate Gradient Algorithm on CPU-GPU Processors

被引:4
|
作者
Ji, Hao [1 ]
Sosonkina, Masha [2 ]
Li, Yaohang [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Dept Modeling Simulat & Visualizat Engn, Norfolk, VA 23529 USA
关键词
Block Conjugate Gradient; Multi-core CPU; Graphics Processing Unit; Intel Xeon Phi; Performance Evaluation;
D O I
10.1109/Co-HPC.2014.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the implementation of the Block Conjugate Gradient (BCG) algorithm on CPU-GPU processors. By analyzing the performance of various matrix operations in BCG, we identify the main performance bottleneck in constructing new search direction matrices. Replacing the QR decomposition by eigendecomposition of a small matrix remedies the problem by reducing the computational cost of generating orthogonal search directions. Moreover, a hybrid (offload) computing scheme is designed to enables the BCG implementation to handle linear systems with large, sparse coefficient matrices that cannot fit in the GPU memory. The hybrid scheme offloads matrix operations to GPU processors while helps hide the CPU-GPU memory transaction overhead. We compare the performance of our BCG implementation with the one on CPU with Intel Xeon Phi coprocessors using the automatic offload mode. With sufficient number of right hand sides, the CPU-GPU implementation of BCG can reach speedup of 2.61 over the CPU-only implementation, which is significantly higher than that of the CPU-Intel Xeon Phi implementation.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 50 条
  • [31] Parabolic Radon transform parallel algorithm for CPU-GPU heterogeneous platform
    Zhang Q.
    Lin B.
    Yang B.
    Peng B.
    Zhang W.
    Tu R.
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2020, 55 (06): : 1263 - 1270
  • [32] An industrial defect detection algorithm based on CPU-GPU parallel call
    Zhu Li
    Hong-wei Lin
    Yuan-yuan Liu
    Chong Chen
    Yun-fei Xia
    Multimedia Tools and Applications, 2023, 82 : 44191 - 44207
  • [33] An improved smith-waterman algorithm on heterogeneous CPU-GPU Systems
    Yin, Meng Jia
    Xu, Xianbin
    Xiong, Zenggang
    Zhang, Tao
    Zheng, Fang
    International Journal of Applied Mathematics and Statistics, 2013, 50 (20): : 499 - 507
  • [34] A Peta-scalable CPU-GPU Algorithm for Global Atmospheric Simulations
    Yang, Chao
    Xue, Wei
    Fu, Haohuan
    Gan, Lin
    Li, Linfeng
    Xu, Yangtong
    Lu, Yutong
    Sun, Jiachang
    Yang, Guangwen
    Zheng, Weimin
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 1 - 11
  • [35] Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions
    Qi Zhu
    Bo Wu
    Xipeng Shen
    Kai Shen
    Li Shen
    Zhiying Wang
    Frontiers of Computer Science, 2017, 11 : 130 - 146
  • [36] Understanding co-run performance on CPU-GPU integrated processors: observations, insights, directions
    Zhu, Qi
    Wu, Bo
    Shen, Xipeng
    Shen, Kai
    Shen, Li
    Wang, Zhiying
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (01) : 130 - 146
  • [37] CPU-Assisted GPGPU on Fused CPU-GPU Architectures
    Yang, Yi
    Xiang, Ping
    Mantor, Mike
    Zhou, Huiyang
    2012 IEEE 18TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2012, : 103 - 114
  • [38] A Hybrid CPU-GPU Implementation to Accelerate Multiple Pairwise Protein Sequence Alignment
    Shehab, Mohammed A.
    Ghadawi, Abdullah A.
    Alawneh, Luay
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    2017 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2017, : 12 - 17
  • [39] AN EFFICIENT CPU-GPU IMPLEMENTATION OF THE MULTIPLE ABSORPTION COEFFICIENT ZONAL METHOD (MACZM)
    Ghannam, Boutros
    Nemer, Maroun
    El Khoury, Khalil
    Yuen, Walter
    NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2012, 62 (06) : 439 - 461
  • [40] Implementation and Analysis of GNSS Software Receiver on Embedded CPU-GPU Heterogeneous Architecture
    Park, Kwi Woo
    Jang, Woo Jin
    Park, Chansik
    Kim, Sunwoo
    Lee, Min Jun
    PROCEEDINGS OF THE 29TH INTERNATIONAL TECHNICAL MEETING OF THE SATELLITE DIVISION OF THE INSTITUTE OF NAVIGATION (ION GNSS+ 2016), 2016, : 70 - 76