CPU vs. GPU - Performance comparison for the Gram-Schmidt algorithm

被引:0
|
作者
T. Brandes
A. Arnold
T. Soddemann
D. Reith
机构
[1] Schloss Birlinghoven,Fraunhofer Institute SCAI
[2] University of Stuttgart,Institute for Computational Physics
关键词
Graphic Processing Unit; European Physical Journal Special Topic; Shared Memory; Memory Bandwidth; Thread Block;
D O I
暂无
中图分类号
学科分类号
摘要
The Gram-Schmidt method is a classical method for determining QR decompositions, which is commonly used in many applications in computational physics, such as orthogonalization of quantum mechanical operators or Lyapunov stability analysis. In this paper, we discuss how well the Gram-Schmidt method performs on different hardware architectures, including both state-of-the-art GPUs and CPUs. We explain, in detail, how a smart interplay between hardware and software can be used to speed up those rather compute intensive applications as well as the benefits and disadvantages of several approaches. In addition, we compare some highly optimized standard routines of the BLAS libraries against our own optimized routines on both processor types. Particular attention was paid to the strong hierarchical memory of modern GPUs and CPUs, which requires cache-aware blocking techniques for optimal performance. Our investigations show that the performance strongly depends on the employed algorithm, compiler and a little less on the employed hardware. Remarkably, the performance of the NVIDIA CUDA BLAS routines improved significantly from CUDA 3.2 to CUDA 4.0. Still, BLAS routines tend to be slightly slower than manually optimized code on GPUs, while we were not able to outperform the BLAS routines on CPUs. Comparing optimized implementations on different hardware architectures, we find that a NVIDIA GeForce GTX580 GPU is about 50% faster than a corresponding Intel X5650 Westmere hexacore CPU. The self-written codes are included as supplementary material.
引用
下载
收藏
页码:73 / 88
页数:15
相关论文
共 50 条
  • [21] Multi-user detector based on Gram-Schmidt algorithm
    Ren, Pinyi
    Zhu, Shihua
    Wang, Yonggang
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2001, 35 (06): : 586 - 590
  • [22] A modified Gram-Schmidt algorithm with iterative orthogonalization and column pivoting
    Dax, A
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2000, 310 (1-3) : 25 - 42
  • [23] A COMPARISON OF THE EIGENVECTOR WEIGHTING AND GRAM-SCHMIDT ADAPTIVE ANTENNA TECHNIQUES
    JENKINS, RW
    MORELAND, KW
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1993, 29 (02) : 568 - 575
  • [24] A GPU vs CPU performance evaluation of an experimental video compression algorithm
    Katsigiannis, Stamos
    Dimitsas, Vasilis
    Maroulis, Dimitris
    2015 SEVENTH INTERNATIONAL WORKSHOP ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2015,
  • [25] Parallel processing of a raytracer for GPU vs. for CPU
    Liao, SW
    Du, ZH
    Wu, GS
    Lueh, GY
    PDPTA '05: Proceedings of the 2005 International Conference on Parallel and Distributed Processing Techniques and Applications, Vols 1-3, 2005, : 1024 - 1030
  • [26] Iterative QR Decomposition Architecture Using the Modified Gram-Schmidt Algorithm
    Lin, Kuang-Hao
    Lin, Chih-Hung
    Chang, Robert Chen-Hao
    Huang, Chien-Lin
    Chen, Feng-Chi
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 1409 - +
  • [27] RECURSIVE MODIFIED GRAM-SCHMIDT ALGORITHM FOR LINEAR-PHASE FILTERING
    SUNWOO, JS
    UN, CK
    SIGNAL PROCESSING, 1991, 22 (01) : 43 - 51
  • [28] A RECURSIVE MODIFIED GRAM-SCHMIDT ALGORITHM FOR LEAST-SQUARES ESTIMATION
    LING, FY
    MANOLAKIS, D
    PROAKIS, JG
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04): : 829 - 836
  • [29] A RECURSIVE MODIFIED GRAM-SCHMIDT ALGORITHM-BASED ADAPTIVE BEAMFORMER
    JAGADEESHA, SN
    SINHA, SN
    MEHRA, DK
    SIGNAL PROCESSING, 1994, 39 (1-2) : 69 - 78
  • [30] APPLICATION OF GRAM-SCHMIDT ALGORITHM TO OPTIMUM RADAR SIGNAL-PROCESSING
    FARINA, A
    STUDER, FA
    IEE PROCEEDINGS-F RADAR AND SIGNAL PROCESSING, 1984, 131 (02) : 139 - 145