OuterSPACE: An Outer Product based Sparse Matrix Multiplication Accelerator

被引：150

作者：

Pal, Subhankar ^{[1
]}

Beaumont, Jonathan ^{[1
]}

Park, Dong-Hyeon ^{[1
]}

Amarnath, Aporva ^{[1
]}

Feng, Siying ^{[1
]}

Chakrabarti, Chaitali ^{[2
]}

Kim, Hun-Seok ^{[1
]}

Blaauw, David ^{[1
]}

Mudge, Trevor ^{[1
]}

Dreslinski, Ronald ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Arizona State Univ, Tempe, AZ USA

来源：

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2018年

关键词：

Sparse matrix processing; application-specific hardware; parallel computer architecture; hardware-software codesign; hardware accelerators; ALGEBRA; PERFORMANCE;

D O I：

10.1109/HPCA.2018.00067

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator targeted at applications that involve large sparse matrices. OuterSPACE is a highly-scalable, energy-efficient, reconfigurable design, consisting of massively parallel Single Program, Multiple Data (SPMD)style processing units, distributed memories, high-speed crossbars and High Bandwidth Memory (HBM). We identify redundant memory accesses to non-zeros as a key bottleneck in traditional sparse matrix-matrix multiplication algorithms. To ameliorate this, we implement an outer product based matrix multiplication technique that eliminates redundant accesses by decoupling multiplication from accumulation. We demonstrate that traditional architectures, due to limitations in their memory hierarchies and ability to harness parallelism in the algorithm, are unable to take advantage of this reduction without incurring significant overheads. OuterSPACE is designed to specifically overcome these challenges. We simulate the key components of our architecture using gem5 on a diverse set of matrices from the University of Florida's SuiteSparse collection and the Stanford Network Analysis Project and show a mean speedup of 7.9x over Intel Math Kernel Library on a Xeon CPU, 13.0x against cuSPARSE and 14.0x against CUSP when run on an NVIDIA K40 GPU, while achieving an average throughput of 2.9 GFLOPS within a 24 W power budget in an area of 87 mm2.

引用

下载

页码：724 / 736

页数：13

共 50 条

[1] MOSCON: Modified Outer Product based Sparse Matrix-Matrix Multiplication Accelerator with Configurable Tiles
Noble, G.
Nalesh, S.
Kala, S.
2023 36TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2023 22ND INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, VLSID, 2023, : 264 - 269
[2] MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product
Srivastava, Nitish
Jin, Hanchen
Liu, Jie
Albonesi, David
Zhang, Zhiru
2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 766 - 780
[3] Towards Memristor based Accelerator for Sparse Matrix Vector Multiplication
Cui, Jianwei
Qiu, Qinru
2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 121 - 124
[4] Sparse-Sparse Matrix Multiplication Accelerator on FPGA featuring Distribute-Merge Product Dataflow
Nagahara, Yuta
Yan, Jiale
Kawamura, Kazushi
Motomura, Masato
Chu, Thiem Van
29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 785 - 791
[5] Row-Wise Product-Based Sparse Matrix Multiplication Hardware Accelerator With Optimal Load Balancing
Lee, Jong Hun
Park, Beomjin
Kong, Joonho
Munir, Arslan
IEEE ACCESS, 2022, 10 : 64547 - 64559
[6] SIMULTANEOUS INPUT AND OUTPUT MATRIX PARTITIONING FOR OUTER-PRODUCT-PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION
Akbudak, Kadir
Aykanat, Cevdet
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C568 - C590
[7] A sparse matrix vector multiplication accelerator based on high-bandwidth memory
Li, Tao
Shen, Li
COMPUTERS & ELECTRICAL ENGINEERING, 2023, 105
[8] Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product
Lu, Xiaobo
Fang, Jianbin
Peng, Lin
Huang, Chun
Du, Zidong
Zhao, Yongwei
Wang, Zheng
ACM Transactions on Architecture and Code Optimization, 2024, 21 (04)
[9] An Efficient Gustavson-Based Sparse Matrix-Matrix Multiplication Accelerator on Embedded FPGAs
Li, Shiqing
Huai, Shuo
Liu, Weichen
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4671 - 4680
[10] InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing
Baek, Daehyeon
Hwang, Soojin
Heo, Taekyung
Kim, Daehoon
Huh, Jaehyuk
30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021), 2021, : 116 - 128

← 1 2 3 4 5 →