Sampled Dense Matrix Multiplication for High-Performance Machine Learning

被引:16
|
作者
Nisa, Israt [1 ]
Sukumaran-Rajam, Aravind [1 ]
Kurt, Sureyya Emre [1 ]
Hong, Changwan [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
SDDMM; GPU; Optimization; Sparse matrix; FACTORIZATION;
D O I
10.1109/HiPC.2018.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many machine learning methods involve iterative optimization and are amenable to a variety of alternate formulations. Many currently popular formulations for some machine learning methods based on core operations that essentially correspond to sparse matrix-vector products. A reformulation using sparse matrix-matrix products primitives can potentially enable significant performance enhancement. Sampled Dense-Dense Matrix Multiplication (SDDMM) is a primitive that has been shown to be usable as a core component in reformulations of many machine learning factor analysis algorithms such as Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gamma Poisson (GaP). It requires the computation of the product of two input dense matrices but only at locations of the result matrix corresponding to nonzero entries in a sparse third input matrix. In this paper, we address the development of cuSDDMM, a multi-node GPU-accelerated implementation for SDDMM. We analyze the data reuse characteristics of SDDMM and develop a model-driven strategy for choice of tiling permutation and tile-size choice. cuSDDMM improves significantly (upto 4.6x) over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).
引用
收藏
页码:32 / 41
页数:10
相关论文
共 50 条
  • [1] Anatomy of high-performance matrix multiplication
    Goto, Kazushige
    Van De Geijn, Robert A.
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):
  • [2] A family of high-performance matrix multiplication algorithms
    Gunnels, JA
    Gustavson, FG
    Henry, GM
    van de Geijn, RA
    [J]. APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2006, 3732 : 256 - 265
  • [3] High-Performance Matrix-Vector Multiplication on the GPU
    Sorensen, Hans Henrik Brandenborg
    [J]. EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
  • [4] High-performance systolic arrays for band matrix multiplication
    Yang, Y
    Zhao, WQ
    Inoue, Y
    [J]. 2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 1130 - 1133
  • [5] A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures
    Vasilios Kelefouras
    A. Kritikakou
    Iosif Mporas
    Vasilios Kolonias
    [J]. The Journal of Supercomputing, 2016, 72 : 804 - 844
  • [6] A High-Performance Accelerator for Floating-Point Matrix Multiplication
    Jia, Xun
    Wu, Gunning
    Xie, Xianghui
    [J]. 2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 396 - 402
  • [7] Anatomy of High-Performance Many-Threaded Matrix Multiplication
    Smith, Tyler M.
    van de Geijn, Robert
    Smelyanskiy, Mikhail
    Hammond, Jeff R.
    Van Zee, Field G.
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [8] A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures
    Kelefouras, Vasilios
    Kritikakou, A.
    Mporas, Iosif
    Kolonias, Vasilios
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (03): : 804 - 844
  • [9] Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs
    Yu, Zhongming
    Dai, Guohao
    Huang, Guyue
    Wang, Yu
    Yang, Huazhong
    [J]. 2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 567 - 574
  • [10] Fault-tolerant high-performance matrix multiplication:: Theory and practice
    Gunnels, JA
    Katz, DS
    Quintana-Ortí, ES
    van de Geijn, RA
    [J]. INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2001, : 47 - 56