Sampled Dense Matrix Multiplication for High-Performance Machine Learning

被引：16

作者：

Nisa, Israt ^{[1
]}

Sukumaran-Rajam, Aravind ^{[1
]}

Kurt, Sureyya Emre ^{[1
]}

Hong, Changwan ^{[1
]}

Sadayappan, P. ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2018年

基金：

美国国家科学基金会;

关键词：

SDDMM; GPU; Optimization; Sparse matrix; FACTORIZATION;

D O I：

10.1109/HiPC.2018.00013

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many machine learning methods involve iterative optimization and are amenable to a variety of alternate formulations. Many currently popular formulations for some machine learning methods based on core operations that essentially correspond to sparse matrix-vector products. A reformulation using sparse matrix-matrix products primitives can potentially enable significant performance enhancement. Sampled Dense-Dense Matrix Multiplication (SDDMM) is a primitive that has been shown to be usable as a core component in reformulations of many machine learning factor analysis algorithms such as Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gamma Poisson (GaP). It requires the computation of the product of two input dense matrices but only at locations of the result matrix corresponding to nonzero entries in a sparse third input matrix. In this paper, we address the development of cuSDDMM, a multi-node GPU-accelerated implementation for SDDMM. We analyze the data reuse characteristics of SDDMM and develop a model-driven strategy for choice of tiling permutation and tile-size choice. cuSDDMM improves significantly (upto 4.6x) over the best currently available GPU implementation of SDDMM (in the BIDMach Machine Learning library).

引用

页码：32 / 41

页数：10

共 50 条

[21] SparCML: High-Performance Sparse Communication for Machine Learning
Renggli, Cedric
Ashkboos, Saleh
Aghagolzadeh, Mehdi
Alistarh, Dan
Hoefler, Torsten
[J]. PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[22] Machine learning toward high-performance electrochemical sensors
Giordano, Gabriela F.
Ferreira, Larissa F.
Bezerra, italo R. S.
Barbosa, Julia A.
Costa, Juliana N. Y.
Pimentel, Gabriel J. C.
Lima, Renato S.
[J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2023, 415 (18) : 3683 - 3692
[23] Network Support for High-Performance Distributed Machine Learning
Malandrino, Francesco
Chiasserini, Carla Fabiana
Molner, Nuria
de la Oliva, Antonio
[J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (01) : 264 - 278
[24] DeltaSPARSE: High-Performance Sparse General Matrix-Matrix Multiplication on Multi-GPU Systems
Yang, Shuai
Zhang, Changyou
Ma, Ji
[J]. 2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 194 - 202
[25] Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation
Fox, Geoffrey
Glazier, James A.
Kadupitiya, J. C. S.
Jadhao, Vikram
Kim, Minje
Qiu, Judy
Sluka, James P.
Somogyi, Endre
Marathe, Madhav
Adiga, Abhijin
Chen, Jiangzhuo
Beckstein, Oliver
Jha, Shantenu
[J]. 2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 422 - 429
[26] A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution
Wang, Ruimin
Yang, Zhiwei
Xu, Hao
Lu, Lu
[J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (02): : 1741 - 1758
[27] SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms
Elafrou, Athena
Karakasis, Vasileios
Gkountouvas, Theodoros
Kourtis, Kornilios
Goumas, Georgios
Koziris, Nectarios
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2018, 44 (03):
[28] IMPLEMENTING HIGH-PERFORMANCE COMPLEX MATRIX MULTIPLICATION VIA THE 1M METHOD
Van Zee, Field G.
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2020, 42 (05): : C221 - C244
[29] A high-performance batched matrix multiplication framework for GPUs under unbalanced input distribution
Ruimin Wang
Zhiwei Yang
Hao Xu
Lu Lu
[J]. The Journal of Supercomputing, 2022, 78 : 1741 - 1758
[30] A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication
Xia, Yufan
De La Pierre, Marco
Barnard, Amanda S.
Barca, Giuseppe Maria Junior
[J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 524 - 534

← 1 2 3 4 5 →