A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

被引:10
|
作者
Quintana-Orti, Gregorio [1 ]
Igual, Francisco D. [1 ]
Marques, Mercedes [1 ]
Quintana-Orti, Enrique S. [1 ]
van de Geijn, Robert A. [2 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana 12071, Spain
[2] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
关键词
Algorithms; Performance; High-performance; libraries; linear algebra; multithreaded architectures; out-of-core algorithms; HIGH-PERFORMANCE; COMPUTATION;
D O I
10.1145/2331130.2331133
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of submatrices and computation as operations with those submatrices. This enables libraries to be coded at a high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as platforms equipped with hardware accelerators.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Performance prediction and analysis of parallel out-of-core matrix factorization
    Caron, E
    Lazure, D
    Utard, G
    HIGH PERFORMANCE COMPUTING - HIPC 2000, PROCEEDINGS, 2001, 1970 : 161 - 172
  • [22] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
    Deveci, Mehmet
    Trott, Christian
    Rajamanickam, Sivasankaran
    PARALLEL COMPUTING, 2018, 78 : 33 - 46
  • [23] ON COMPUTING INVERSE ENTRIES OF A SPARSE MATRIX IN AN OUT-OF-CORE ENVIRONMENT
    Amestoy, Patrick R.
    Duff, Iain S.
    L'Excellent, Jean-Yves
    Robert, Yves
    Rouet, Francois-Henry
    Ucar, Bora
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : A1975 - A1999
  • [24] Efficient Out-of-Core and Out-of-Place Rectangular Matrix Transposition and Rotation
    Godard, Paul
    Loechner, Vincent
    Bastoul, Cedric
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (11) : 1942 - 1948
  • [25] OCAM: Out-of-core coordinate descent algorithm for matrix completion
    Lee, Dongha
    Oh, Jinoh
    Yu, Hwanjo
    INFORMATION SCIENCES, 2020, 514 (514) : 587 - 604
  • [26] Out-of-Core and Dynamic Programming for Data Distribution on a Volume Visualization Cluster
    Frank, S.
    Kaufman, A.
    COMPUTER GRAPHICS FORUM, 2009, 28 (01) : 141 - 153
  • [27] Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
    Krishnan, S
    Krishnamoorthy, S
    Baumgartner, G
    Lam, CC
    Ramanujam, J
    Sadayappan, P
    Choppella, V
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (05) : 659 - 673
  • [29] Efficient out-of-core algorithms for linear relaxation using blocking covers
    Leiserson, CE
    Rao, S
    Toledo, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 54 (02) : 332 - 344
  • [30] A Unified Runtime System for Heterogeneous Multi-core Architectures
    Augonnet, Cedric
    Namyst, Raymond
    EURO-PAR 2008 WORKSHOPS - PARALLEL PROCESSING, 2009, 5415 : 174 - 183