FSpGEMM: A Framework for Accelerating Sparse General Matrix–Matrix Multiplication Using Gustavson’s Algorithm on FPGAs

被引：0

作者：

Tavakoli, Erfan Bank ^{[1
]}

Riera, Michael ^{[1
]}

Quraishi, Masudul Hassan ^{[1
]}

Ren, Fengbo ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85287 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2024年 / 32卷 / 04期

关键词：

Field programmable gate arrays; Sparse matrices; Hardware; Memory management; Graphics processing units; Indexes; Matrix converters; Field-programmable gate array (FPGA); general sparse matrix-matrix multiplication (SpGEMM); Gustavson's algorithm; OpenCL; reconfigurable computing;

D O I：

10.1109/TVLSI.2024.3355499

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

General sparse matrix-matrix multiplication (SpGEMM) is integral to many high-performance computing (HPC) and machine learning applications. However, prior field-programmable gate array (FPGA)-based SpGEMM accelerators either use the inner product algorithm with wasted and costly operations or Gustavson's algorithm with a cache-based hardware architecture suffering from long-latency cache miss penalties and limited to embedded devices. In this work, we propose framework for accelerating SpGEMM (FSpGEMM), an OpenCL-based SpGEMM framework for accelerating Gustvason's algorithm that includes an FPGA kernel implementing a throughput-optimized and scalable hardware architecture compatible with high-bandwidth memory (HBM) or traditional DDR-based memory. In addition, to address the irregular memory access patterns incurred by Gustavson's algorithm, we propose a new buffering scheme tailored to Gustavson's algorithm enabled by a new compressed sparse vector (CSV) format for representing sparse matrices and a row reordering technique as a preprocessing step to improve data reuse, and consequently, resource utilization. The proposed framework includes a host program implementing preprocessing functions for reordering input matrices and storing them in the proposed CSV format for further use. We implemented FSpGEMM using Intel FPGA SDK for OpenCL and experimented with a benchmark of sparse matrices selected from the SuiteSparse Matrix Collection on a Bittware 520N-MX FPGA board. The results show that the reordering technique improves the performance on average by 20.3% compared with the baseline. Finally, FSpGEMM outperforms the state-of-the-art (SOTA) FPGA implementation by an average of 2.23 $\times$ in terms of execution cycles with the same benchmark and memory system configuration for a fair comparison.

引用

页码：633 / 644

页数：12

共 50 条

[1] Sequential implementation of Gustavson's algorithm for sparse matrix multiplication
[J]. Di Felice, Paolino, 1600, (08):
[2] An Efficient Gustavson-Based Sparse Matrix-Matrix Multiplication Accelerator on Embedded FPGAs
Li, Shiqing
Huai, Shuo
Liu, Weichen
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4671 - 4680
[3] Accelerating matrix multiplication on FPGAs
El-Atfy, Rasha
Dessouky, Moliarned A.
El-Ghitani, Hassan
[J]. IDT 2007: SECOND INTERNATIONAL DESIGN AND TEST WORKSHOP, PROCEEDINGS, 2007, : 203 - 204
[4] Low Thread-count Gustavson: A multithreaded algorithm for sparse matrix-matrix multiplication using perfect hashing
Elliott, James J.
Siefert, Christopher M.
[J]. SCALA 2018: PROCEEDINGS OF 2018 IEEE/ACM 9TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS (SCALA), 2018, : 57 - 64
[5] A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs
Jain, Abhishek Kumar
Omidian, Hossein
Fraisse, Henri
Benipal, Mansimran
Liu, Lisa
Gaitonde, Dinesh
[J]. 2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 127 - 132
[6] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
Liu, Weifeng
Vinter, Brian
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
[7] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
Wei, Bingxin
Wang, Yizhuo
Chang, Fangli
Gao, Jianhua
Ji, Weixing
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
[8] CASE STUDY ON PROGRAMMING IN A PAGED ENVIRONMENT: AN IMPLEMENTATION OF GUSTAVSON'S FAST ALGORITHM FOR SPARSE MATRIX MULTIPLICATION.
Muehlbacher, Joerg R.
[J]. Angewandte Informatik/Applied Informatics, 1980, 22 (02): : 67 - 71
[9] Accelerating Sparse General Matrix-Matrix Multiplication for NVIDIA Volta GPU and Hygon DCU
Tian, Zhuo
Yang, Shuai
Zhang, Changyou
[J]. PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 329 - 330
[10] Design space exploration for sparse matrix-matrix multiplication on FPGAs
Lin, Colin Yu
Wong, Ngai
So, Hayden Kwok-Hay
[J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219

← 1 2 3 4 5 →