Sampled Dense Matrix Multiplication for High-Performance Machine Learning

被引:16
|
作者
Nisa, Israt [1 ]
Sukumaran-Rajam, Aravind [1 ]
Kurt, Sureyya Emre [1 ]
Hong, Changwan [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
SDDMM; GPU; Optimization; Sparse matrix; FACTORIZATION;
D O I
10.1109/HiPC.2018.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many machine learning methods involve iterative optimization and are amenable to a variety of alternate formulations. Many currently popular formulations for some machine learning methods based on core operations that essentially correspond to sparse matrix-vector products. A reformulation using sparse matrix-matrix products primitives can potentially enable significant performance enhancement. Sampled Dense-Dense Matrix Multiplication (SDDMM) is a primitive that has been shown to be usable as a core component in reformulations of many machine learning factor analysis algorithms such as Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gamma Poisson (GaP). It requires the computation of the product of two input dense matrices but only at locations of the result matrix corresponding to nonzero entries in a sparse third input matrix. In this paper, we address the development of cuSDDMM, a multi-node GPU-accelerated implementation for SDDMM. We analyze the data reuse characteristics of SDDMM and develop a model-driven strategy for choice of tiling permutation and tile-size choice. cuSDDMM improves significantly (upto 4.6x) over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).
引用
收藏
页码:32 / 41
页数:10
相关论文
共 50 条
  • [21] SparCML: High-Performance Sparse Communication for Machine Learning
    Renggli, Cedric
    Ashkboos, Saleh
    Aghagolzadeh, Mehdi
    Alistarh, Dan
    Hoefler, Torsten
    [J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [22] Machine learning toward high-performance electrochemical sensors
    Giordano, Gabriela F.
    Ferreira, Larissa F.
    Bezerra, italo R. S.
    Barbosa, Julia A.
    Costa, Juliana N. Y.
    Pimentel, Gabriel J. C.
    Lima, Renato S.
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2023, 415 (18) : 3683 - 3692
  • [23] Network Support for High-Performance Distributed Machine Learning
    Malandrino, Francesco
    Chiasserini, Carla Fabiana
    Molner, Nuria
    de la Oliva, Antonio
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (01) : 264 - 278
  • [24] DeltaSPARSE: High-Performance Sparse General Matrix-Matrix Multiplication on Multi-GPU Systems
    Yang, Shuai
    Zhang, Changyou
    Ma, Ji
    [J]. 2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 194 - 202
  • [25] Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation
    Fox, Geoffrey
    Glazier, James A.
    Kadupitiya, J. C. S.
    Jadhao, Vikram
    Kim, Minje
    Qiu, Judy
    Sluka, James P.
    Somogyi, Endre
    Marathe, Madhav
    Adiga, Abhijin
    Chen, Jiangzhuo
    Beckstein, Oliver
    Jha, Shantenu
    [J]. 2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 422 - 429
  • [26] A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution
    Wang, Ruimin
    Yang, Zhiwei
    Xu, Hao
    Lu, Lu
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (02): : 1741 - 1758
  • [27] SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
    Elafrou, Athena
    Karakasis, Vasileios
    Gkountouvas, Theodoros
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2018, 44 (03):
  • [28] IMPLEMENTING HIGH-PERFORMANCE COMPLEX MATRIX MULTIPLICATION VIA THE 1M METHOD
    Van Zee, Field G.
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (05): : C221 - C244
  • [29] A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution
    Ruimin Wang
    Zhiwei Yang
    Hao Xu
    Lu Lu
    [J]. The Journal of Supercomputing, 2022, 78 : 1741 - 1758
  • [30] A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication
    Xia, Yufan
    De La Pierre, Marco
    Barnard, Amanda S.
    Barca, Giuseppe Maria Junior
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 524 - 534