Acceleration of Approximate Matrix Multiplications on GPUs

被引：0

作者：

Okuyama, Takuya ^{[1
]}

Rohm, Andre ^{[1
]}

Mihana, Takatomo ^{[1
]}

Naruse, Makoto ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat Phys & Comp, Tokyo 1138656, Japan

来源：

ENTROPY | 2023年 / 25卷 / 08期

关键词：

approximate calculation; approximate matrix multiplication; GPU computing; ALGORITHMS; ALIGNMENT;

D O I：

10.3390/e25081130

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Matrix multiplication is important in various information-processing applications, including the computation of eigenvalues and eigenvectors, and in combinatorial optimization algorithms. Therefore, reducing the computation time of matrix products is essential to speed up scientific and practical calculations. Several approaches have been proposed to speed up this process, including GPUs, fast matrix multiplication libraries, custom hardware, and efficient approximate matrix multiplication (AMM) algorithms. However, research to date has yet to focus on accelerating AMMs for general matrices on GPUs, despite the potential of GPUs to perform fast and accurate matrix product calculations. In this paper, we propose a method for improving Monte Carlo AMMs. We also give an analytical solution for the optimal values of the hyperparameters in the proposed method. The proposed method improves the approximation of the matrix product without increasing the computation time compared to the conventional AMMs. It is also designed to work well with parallel operations on GPUs and can be incorporated into various algorithms. Finally, the proposed method is applied to a power method used for eigenvalue computation. We demonstrate that, on an NVIDIA A100 GPU, the computation time can be halved compared to the conventional power method using cuBLAS.

引用

页数：19

共 50 条

[31] Image Re-Ranking Acceleration on GPUs
Guimaraes Pedronette, Daniel Carlos
Torres, Ricardo da S.
Borin, Edson
Breternitz, Mauricio
[J]. 2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 176 - 183
[32] Accelerating Matrix Processing with GPUs
Malaya, Nicholas
Che, Shuai
Greathouse, Joseph L.
van Oostrum, Rene
Schulte, Michael J.
[J]. 2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 139 - 141
[33] Acceleration of fragment molecular orbital method with GPUs
Koga, Ryota
Furukawa, Yuki
Yasuda, Koji
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
[34] INTEGRATED ELECTROOPTIC BRAGG MODULATOR MODULES FOR MATRIX VECTOR AND MATRIX MATRIX MULTIPLICATIONS
LE, P
ZANG, DY
TSAI, CS
[J]. APPLIED OPTICS, 1988, 27 (09): : 1780 - 1785
[35] Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications
Aksoy, Levent
Da Costa, Eduardo
Flores, Paulo
Monteiro, Jose
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2008, 27 (06) : 1013 - 1026
[36] Batched Generation of Incomplete Sparse Approximate Inverses on GPUs
Anzt, Hartwig
Chow, Edmond
Huckle, Thomas
Dongarra, Jack
[J]. PROCEEDINGS OF SCALA 2016: 7TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, 2016, : 49 - 56
[37] High Performance Approximate Sort Algorithm Using GPUs
Xiao, Jun
Chen, Hao
Sun, Jianhua
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENT COMMUNICATION, 2015, 16 : 121 - 124
[38] An eDRAM-Based Approximate Register File for GPUs
Jeong, Donghwan
Oh, Young H.
Lee, Jae W.
Park, Yongjun
[J]. IEEE DESIGN & TEST, 2016, 33 (01) : 23 - 31
[39] Programmable Graph Architectures (PGAs) For Matrix Multiplications and Transposes
Tang, K. Wendy
Oruc, A. Yavuz
[J]. PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS AND INFORMATION SYSTEMS, 2009, : 373 - +
[40] Novel Reconfigurable Hardware Architecture for Polynomial Matrix Multiplications
Redif, Soydan
Kasap, Server
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (03) : 454 - 465

← 1 2 3 4 5 →