CPU vs. GPU - Performance comparison for the Gram-Schmidt algorithm

被引:0
|
作者
T. Brandes
A. Arnold
T. Soddemann
D. Reith
机构
[1] Schloss Birlinghoven,Fraunhofer Institute SCAI
[2] University of Stuttgart,Institute for Computational Physics
关键词
Graphic Processing Unit; European Physical Journal Special Topic; Shared Memory; Memory Bandwidth; Thread Block;
D O I
暂无
中图分类号
学科分类号
摘要
The Gram-Schmidt method is a classical method for determining QR decompositions, which is commonly used in many applications in computational physics, such as orthogonalization of quantum mechanical operators or Lyapunov stability analysis. In this paper, we discuss how well the Gram-Schmidt method performs on different hardware architectures, including both state-of-the-art GPUs and CPUs. We explain, in detail, how a smart interplay between hardware and software can be used to speed up those rather compute intensive applications as well as the benefits and disadvantages of several approaches. In addition, we compare some highly optimized standard routines of the BLAS libraries against our own optimized routines on both processor types. Particular attention was paid to the strong hierarchical memory of modern GPUs and CPUs, which requires cache-aware blocking techniques for optimal performance. Our investigations show that the performance strongly depends on the employed algorithm, compiler and a little less on the employed hardware. Remarkably, the performance of the NVIDIA CUDA BLAS routines improved significantly from CUDA 3.2 to CUDA 4.0. Still, BLAS routines tend to be slightly slower than manually optimized code on GPUs, while we were not able to outperform the BLAS routines on CPUs. Comparing optimized implementations on different hardware architectures, we find that a NVIDIA GeForce GTX580 GPU is about 50% faster than a corresponding Intel X5650 Westmere hexacore CPU. The self-written codes are included as supplementary material.
引用
收藏
页码:73 / 88
页数:15
相关论文
共 50 条
  • [1] CPU vs. GPU - Performance comparison for the Gram-Schmidt algorithm
    Brandes, T.
    Arnold, A.
    Soddemann, T.
    Reith, D.
    [J]. EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2012, 210 (01): : 73 - 88
  • [2] MODIFIED GRAM-SCHMIDT PROCESS VS CLASSICAL GRAM-SCHMIDT
    LONGLEY, JW
    [J]. COMMUNICATIONS IN STATISTICS PART B-SIMULATION AND COMPUTATION, 1981, 10 (05): : 517 - 527
  • [3] ON THE PERFORMANCE OF ADAPTIVE GRAM-SCHMIDT ALGORITHM FOR INTERFERENCE CANCELING ARRAYS
    KO, CC
    [J]. IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 1991, 39 (04) : 505 - 511
  • [4] On growth factors of the modified Gram-Schmidt algorithm
    Wei, Musheng
    Liu, Qiaohua
    [J]. NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2008, 15 (07) : 621 - 636
  • [5] An approach of orthogonalization within the Gram-Schmidt algorithm
    Rivaz, A.
    Moghadam, M. Mohseni
    Sadeghi, D.
    Kermani, H. Momenaee
    [J]. COMPUTATIONAL & APPLIED MATHEMATICS, 2018, 37 (02): : 1250 - 1262
  • [6] A genetic algorithm solution to the gram-schmidt image fusion
    Yilmaz, Volkan
    Yilmaz, Cigdem Serifoglu
    Gungor, Oguz
    Shan, Jie
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (04) : 1458 - 1485
  • [7] HIGH-PERFORMANCE ARCHITECTURES FOR ADAPTIVE FILTERING BASED ON THE GRAM-SCHMIDT ALGORITHM
    GALLIVAN, KA
    LEISERSON, CE
    [J]. PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1984, 495 : 30 - 38
  • [8] APPLICATION OF GRAM-SCHMIDT ALGORITHM TO FULLY ADAPTIVE ARRAYS
    LIU, HL
    GHAFOOR, A
    STOCKMANN, PH
    [J]. IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1992, 28 (02) : 324 - 334
  • [9] LOSS AND RECAPTURE OF ORTHOGONALITY IN THE MODIFIED GRAM-SCHMIDT ALGORITHM
    BJORCK, A
    PAIGE, CC
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1992, 13 (01) : 176 - 190
  • [10] Modified Gram-Schmidt Algorithm for Extreme Learning Machine
    Yin, Jianchuan
    Dong, Fang
    Wang, Nini
    [J]. SECOND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 2, PROCEEDINGS, 2009, : 517 - +