Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication

被引：1

作者：

Hwang, Soojin ^{[1
]}

Baek, Daehyeon ^{[1
]}

Park, Jongse ^{[1
]}

Huh, Jaehyuk ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, 291 Daehak Ro, Daejeon 34141, South Korea

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2024年 / 21卷 / 02期

关键词：

Sparse Matrix-Vector Multiplication (SpMV); accelerator;

D O I：

10.1145/3653020

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm and matrix representation. There have been two widely used algorithms and data representations. Two algorithms, scalar multiplication and dot product, can be combined with two sparse data representations, compressed sparse and bitmap formats for the matrix and vector. Although the prior accelerators adopted one of the possible designs, it is yet to be investigated which design is the best one across different hardware resources and workload characteristics. This paper first investigates the impact of design choices with respect to the algorithm and data representation. Our evaluation shows that no single design always outperforms the others across different workloads, but the two best designs (i.e., compressed sparse format and bitmap format with dot product) have complementary performance with trade-offs incurred by the matrix characteristics. Based on the analysis, this study proposes Cerberus, a triple-mode accelerator supporting two sparse operation modes in addition to the base dense mode. To allow such multi-mode operation, it proposes a prediction model based on matrix characteristics under a given hardware configuration, which statically selects the best mode for a given sparse matrix with its dimension and density information. Our experimental results show that Cerberus provides 12.1x performance improvements from a dense-only accelerator, and 1.5x improvements from a fixed best SpMV design.

引用

页数：24

共 50 条

[41] High performance sparse matrix-vector multiplication on FPGA
Zou, Dan
Dou, Yong
Guo, Song
Ni, Shice
IEICE ELECTRONICS EXPRESS, 2013, 10 (17):
[42] Processor-efficient sparse matrix-vector multiplication
Heath, LS
Ribbens, CJ
Pemmaraju, SV
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2004, 48 (3-4) : 589 - 608
[43] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
Abu-Sufah, Walid
Ahmad, Khalid
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124
[44] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
Yilmaz, Buse
Aktemur, Baris
Garzaran, Maria J.
Kamin, Sam
Kirac, Furkan
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)
[45] Load-balancing in sparse matrix-vector multiplication
Nastea, SG
Frieder, O
ElGhazawi, T
EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
[46] Energy Evaluation of Sparse Matrix-Vector Multiplication on GPU
Benatia, Akrem
Ji, Weixing
Wang, Yizhuo
Shi, Feng
2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
[47] Sparse Matrix-Vector Multiplication Based on Online Arithmetic
Cherati, Sahar Moradi
Jaberipur, Ghassem
Sousa, Leonel
IEEE ACCESS, 2024, 12 : 87653 - 87664
[48] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
[49] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[50] Optimization by Runtime Specialization for Sparse Matrix-Vector Multiplication
Kamin, Sam
Garzaran, Maria Jesus
Aktemur, Baris
Xu, Danqing
Yilmaz, Buse
Chen, Zhongbo
ACM SIGPLAN NOTICES, 2015, 50 (03) : 93 - 102

← 1 2 3 4 5 →