Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices

被引:4
|
作者
Gates, Mark [1 ]
Kurzak, Jakub [1 ]
Luszczek, Piotr [1 ]
Pei, Yu [1 ]
Dongarra, Jack [2 ,3 ,4 ]
机构
[1] Univ Tennessee, Innovat Comp Lab, Knoxville, TN 37996 USA
[2] Univ Tennessee, Knoxville, TN 37996 USA
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
[4] Univ Manchester, Manchester, Lancs, England
基金
美国国家科学基金会;
关键词
batch computation; GPU computing; numerical linear algebra; Cholesky factorization; data layout;
D O I
10.1109/IPDPSW.2017.18
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Batch matrix operations address the case of solving the same linear algebra problem for a very large number of very small matrices. In this paper, we focus on implementing the batch Cholesky factorization in CUDA, in single precision arithmetic, for NVIDIA GPUs. Specifically, we look into the benefits of using noncanonical data layouts, where consecutive memory locations store elements with the same row and column index in a set of consecutive matrices. We discuss a number of different implementation options and tuning parameters. We demonstrate superior performance to traditional implementations for the case of very small matrices.
引用
收藏
页码:1408 / 1417
页数:10
相关论文
共 34 条
  • [1] Batched Cholesky Factorization for tiny matrices
    Lemaitre, Florian
    Lacassagne, Lionel
    PROCEEDINGS OF THE 2016 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL & IMAGE PROCESSING, 2016, : 130 - 137
  • [2] Cholesky factorization of semidefinite Toeplitz matrices
    Stewart, M
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1997, 254 : 497 - 525
  • [3] LDU factorization and Cholesky factorization of row (column) antisymmetric matrices
    Yuan, Hui-Ping
    PROCEEDINGS OF THE 14TH CONFERENCE OF INTERNATIONAL LINEAR ALGEBRA SOCIETY, 2007, : 390 - 393
  • [4] Cholesky, Toeplitz and the triangular factorization of symmetric matrices
    Taussky, O
    Todd, J
    NUMERICAL ALGORITHMS, 2006, 41 (02) : 197 - 202
  • [5] Cholesky, Toeplitz and the triangular factorization of symmetric matrices
    Olga Taussky
    John Todd
    Numerical Algorithms, 2006, 41 : 197 - 202
  • [6] Cholesky factorization of matrices in parallel and ranking of graphs
    Dereniowski, D
    Kubale, M
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2004, 3019 : 985 - 992
  • [7] Cholesky factorization of band matrices using multithreaded BLAS
    Remon, Alfredo
    Quintana-Orti, Enrique S.
    Quintana-Orti, Gregorio
    APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2007, 4699 : 608 - +
  • [8] A NOTE ON THE PARALLEL CHOLESKY FACTORIZATION OF WIDE BANDED MATRICES
    CONROY, JM
    PARALLEL COMPUTING, 1989, 10 (02) : 239 - 246
  • [9] Cholesky Factorization of Tile Low Rank Matrices on GPUs
    Boukaram, Wajih
    Zampini, Stefano
    Turkiyyah, George
    Keyest, David
    PROCEEDINGS OF THE 2024 SIAM CONFERENCE ON PARALLEL PROCESSING FOR SCIENTIFIC COMPUTING, PP, 2024, : 65 - 77
  • [10] Full rank Cholesky factorization for rank deficient matrices
    Canto, Rafael
    Pelaez, Maria J.
    Urbano, Ana M.
    APPLIED MATHEMATICS LETTERS, 2015, 40 : 17 - 22