FSpGEMM: A Framework for Accelerating Sparse General Matrix–Matrix Multiplication Using Gustavson’s Algorithm on FPGAs

被引:0
|
作者
Tavakoli, Erfan Bank [1 ]
Riera, Michael [1 ]
Quraishi, Masudul Hassan [1 ]
Ren, Fengbo [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
关键词
Field programmable gate arrays; Sparse matrices; Hardware; Memory management; Graphics processing units; Indexes; Matrix converters; Field-programmable gate array (FPGA); general sparse matrix-matrix multiplication (SpGEMM); Gustavson's algorithm; OpenCL; reconfigurable computing;
D O I
10.1109/TVLSI.2024.3355499
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
General sparse matrix-matrix multiplication (SpGEMM) is integral to many high-performance computing (HPC) and machine learning applications. However, prior field-programmable gate array (FPGA)-based SpGEMM accelerators either use the inner product algorithm with wasted and costly operations or Gustavson's algorithm with a cache-based hardware architecture suffering from long-latency cache miss penalties and limited to embedded devices. In this work, we propose framework for accelerating SpGEMM (FSpGEMM), an OpenCL-based SpGEMM framework for accelerating Gustvason's algorithm that includes an FPGA kernel implementing a throughput-optimized and scalable hardware architecture compatible with high-bandwidth memory (HBM) or traditional DDR-based memory. In addition, to address the irregular memory access patterns incurred by Gustavson's algorithm, we propose a new buffering scheme tailored to Gustavson's algorithm enabled by a new compressed sparse vector (CSV) format for representing sparse matrices and a row reordering technique as a preprocessing step to improve data reuse, and consequently, resource utilization. The proposed framework includes a host program implementing preprocessing functions for reordering input matrices and storing them in the proposed CSV format for further use. We implemented FSpGEMM using Intel FPGA SDK for OpenCL and experimented with a benchmark of sparse matrices selected from the SuiteSparse Matrix Collection on a Bittware 520N-MX FPGA board. The results show that the reordering technique improves the performance on average by 20.3% compared with the baseline. Finally, FSpGEMM outperforms the state-of-the-art (SOTA) FPGA implementation by an average of 2.23 $\times$ in terms of execution cycles with the same benchmark and memory system configuration for a fair comparison.
引用
收藏
页码:633 / 644
页数:12
相关论文
共 50 条
  • [2] An Efficient Gustavson-Based Sparse Matrix-Matrix Multiplication Accelerator on Embedded FPGAs
    Li, Shiqing
    Huai, Shuo
    Liu, Weichen
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4671 - 4680
  • [3] Accelerating matrix multiplication on FPGAs
    El-Atfy, Rasha
    Dessouky, Moliarned A.
    El-Ghitani, Hassan
    [J]. IDT 2007: SECOND INTERNATIONAL DESIGN AND TEST WORKSHOP, PROCEEDINGS, 2007, : 203 - 204
  • [4] Low Thread-count Gustavson: A multithreaded algorithm for sparse matrix-matrix multiplication using perfect hashing
    Elliott, James J.
    Siefert, Christopher M.
    [J]. SCALA 2018: PROCEEDINGS OF 2018 IEEE/ACM 9TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS (SCALA), 2018, : 57 - 64
  • [5] A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs
    Jain, Abhishek Kumar
    Omidian, Hossein
    Fraisse, Henri
    Benipal, Mansimran
    Liu, Lisa
    Gaitonde, Dinesh
    [J]. 2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 127 - 132
  • [6] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [7] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [8] CASE STUDY ON PROGRAMMING IN A PAGED ENVIRONMENT: AN IMPLEMENTATION OF GUSTAVSON'S FAST ALGORITHM FOR SPARSE MATRIX MULTIPLICATION.
    Muehlbacher, Joerg R.
    [J]. Angewandte Informatik/Applied Informatics, 1980, 22 (02): : 67 - 71
  • [9] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
    Tian, Zhuo
    Yang, Shuai
    Zhang, Changyou
    [J]. PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
  • [10] Design space exploration for sparse matrix-matrix multiplication on FPGAs
    Lin, Colin Yu
    Wong, Ngai
    So, Hayden Kwok-Hay
    [J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219