A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures

被引:10
|
作者
Quintana-Orti, Gregorio [1 ]
Igual, Francisco D. [1 ]
Marques, Mercedes [1 ]
Quintana-Orti, Enrique S. [1 ]
van de Geijn, Robert A. [2 ]
机构
[1] Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana 12071, Spain
[2] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
来源
关键词
Algorithms; Performance; High-performance; libraries; linear algebra; multithreaded architectures; out-of-core algorithms; HIGH-PERFORMANCE; COMPUTATION;
D O I
10.1145/2331130.2331133
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused on optimal use of memory so as to minimize I/O, often trading programmability for performance. In this article we show how the current state of hardware and software allows the programmability problem to be addressed without sacrificing performance. This comes from the realizations that memory is cheap and large, making it less necessary to optimally orchestrate I/O, and that new algorithms view matrices as collections of submatrices and computation as operations with those submatrices. This enables libraries to be coded at a high level of abstraction, leaving the tasks of scheduling the computations and data movement in the hands of a runtime system. This is in sharp contrast to more traditional approaches that leverage optimal use of in-core memory and, at the expense of introducing considerable programming complexity, explicit overlap of I/O with computation. Performance is demonstrated for this approach on multicore architectures as well as platforms equipped with hardware accelerators.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Out-of-core macromolecular simulations on multithreaded architectures
    Aliaga, Jose I.
    Badia, Jose M.
    Castillo, Maribel
    Davidovic, Davor
    Mayo, Rafael
    Quintana-Orti, Enrique S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (06): : 1540 - 1550
  • [2] Fast multithreaded out-of-core visualization technique
    Sulatycke, Peter D.
    Ghose, Kanad
    Proceedings of the International Parallel Processing Symposium, IPPS, 1999, : 569 - 575
  • [3] A fast multithreaded out-of-core visualization technique
    Sulatycke, PD
    Ghose, K
    IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 569 - 575
  • [4] Grid and cluster matrix computation with persistent storage and out-of-core programming
    Aouad, Lamine M.
    Petiton, Serge G.
    Sato, Mitsuhisa
    2005 IEEE International Conference on Cluster Computing (CLUSTER), 2006, : 372 - 380
  • [5] A Framework to Transform In-Core GPU Algorithms to Out-of-Core Algorithms
    Harada, Takahiro
    PROCEEDINGS I3D 2016: 20TH ACM SIGGRAPH SYMPOSIUM ON INTERACTIVE 3D GRAPHICS AND GAMES, 2016, : 179 - 180
  • [6] Out-of-core Algorithms for Binary Partition Hierarchies
    Josselin Lefèvre
    Jean Cousty
    Benjamin Perret
    Harold Phelippeau
    Journal of Mathematical Imaging and Vision, 2025, 67 (2)
  • [7] An efficient algorithm for out-of-core matrix transposition
    Suh, J
    Prasanna, VK
    IEEE TRANSACTIONS ON COMPUTERS, 2002, 51 (04) : 420 - 438
  • [8] Big Data Analytics Performance for Large Out-Of-Core Matrix Solvers on Advanced Hybrid Architectures
    Rao, Raghavendra Shruti
    Halem, Milton
    Dorband, John
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 2774 - 2778
  • [9] A parallel programming interface for out-of-core cluster applications
    Tang, Jianqi
    Fang, Binxing
    Hu, Mingzeng
    Zhang, Hongli
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2006, 9 (03): : 321 - 327
  • [10] A parallel programming interface for out-of-core cluster applications
    Jianqi Tang
    Binxing Fang
    Mingzeng Hu
    Hongli Zhang
    Cluster Computing, 2006, 9 : 321 - 327