Acceleration of Approximate Matrix Multiplications on GPUs

被引:0
|
作者
Okuyama, Takuya [1 ]
Rohm, Andre [1 ]
Mihana, Takatomo [1 ]
Naruse, Makoto [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat Phys & Comp, Tokyo 1138656, Japan
关键词
approximate calculation; approximate matrix multiplication; GPU computing; ALGORITHMS; ALIGNMENT;
D O I
10.3390/e25081130
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Matrix multiplication is important in various information-processing applications, including the computation of eigenvalues and eigenvectors, and in combinatorial optimization algorithms. Therefore, reducing the computation time of matrix products is essential to speed up scientific and practical calculations. Several approaches have been proposed to speed up this process, including GPUs, fast matrix multiplication libraries, custom hardware, and efficient approximate matrix multiplication (AMM) algorithms. However, research to date has yet to focus on accelerating AMMs for general matrices on GPUs, despite the potential of GPUs to perform fast and accurate matrix product calculations. In this paper, we propose a method for improving Monte Carlo AMMs. We also give an analytical solution for the optimal values of the hyperparameters in the proposed method. The proposed method improves the approximation of the matrix product without increasing the computation time compared to the conventional AMMs. It is also designed to work well with parallel operations on GPUs and can be incorporated into various algorithms. Finally, the proposed method is applied to a power method used for eigenvalue computation. We demonstrate that, on an NVIDIA A100 GPU, the computation time can be halved compared to the conventional power method using cuBLAS.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Image Re-Ranking Acceleration on GPUs
    Guimaraes Pedronette, Daniel Carlos
    Torres, Ricardo da S.
    Borin, Edson
    Breternitz, Mauricio
    [J]. 2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 176 - 183
  • [32] Accelerating Matrix Processing with GPUs
    Malaya, Nicholas
    Che, Shuai
    Greathouse, Joseph L.
    van Oostrum, Rene
    Schulte, Michael J.
    [J]. 2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 139 - 141
  • [33] Acceleration of fragment molecular orbital method with GPUs
    Koga, Ryota
    Furukawa, Yuki
    Yasuda, Koji
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 241
  • [34] INTEGRATED ELECTROOPTIC BRAGG MODULATOR MODULES FOR MATRIX VECTOR AND MATRIX MATRIX MULTIPLICATIONS
    LE, P
    ZANG, DY
    TSAI, CS
    [J]. APPLIED OPTICS, 1988, 27 (09): : 1780 - 1785
  • [35] Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications
    Aksoy, Levent
    Da Costa, Eduardo
    Flores, Paulo
    Monteiro, Jose
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2008, 27 (06) : 1013 - 1026
  • [36] Batched Generation of Incomplete Sparse Approximate Inverses on GPUs
    Anzt, Hartwig
    Chow, Edmond
    Huckle, Thomas
    Dongarra, Jack
    [J]. PROCEEDINGS OF SCALA 2016: 7TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, 2016, : 49 - 56
  • [37] High Performance Approximate Sort Algorithm Using GPUs
    Xiao, Jun
    Chen, Hao
    Sun, Jianhua
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENT COMMUNICATION, 2015, 16 : 121 - 124
  • [38] An eDRAM-Based Approximate Register File for GPUs
    Jeong, Donghwan
    Oh, Young H.
    Lee, Jae W.
    Park, Yongjun
    [J]. IEEE DESIGN & TEST, 2016, 33 (01) : 23 - 31
  • [39] Programmable Graph Architectures (PGAs) For Matrix Multiplications and Transposes
    Tang, K. Wendy
    Oruc, A. Yavuz
    [J]. PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS AND INFORMATION SYSTEMS, 2009, : 373 - +
  • [40] Novel Reconfigurable Hardware Architecture for Polynomial Matrix Multiplications
    Redif, Soydan
    Kasap, Server
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (03) : 454 - 465