HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

被引：0

作者：

Gerogiannis, Gerasimos ^{[1
,2
]}

Aananthakrishnan, Sriram ^{[1
]}

Torrellas, Josep ^{[2
]}

Hur, Ibrahim ^{[1
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Univ Illinois, Urbana, IL 61801 USA

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年

关键词：

PERFORMANCE; ACCURATE;

D O I：

10.1109/HPCA57654.2024.00081

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.

引用

页码：1012 / 1028

页数：17

共 50 条

[1] Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures
Ko, Sho
Rucker, Alexander
Zhang, Yaqi
Mure, Paul
Olukotun, Kunle
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 663 - 670
[2] An Analysis of Accelerator Coupling in Heterogeneous Architectures
Cota, Emilio G.
Mantovani, Paolo
Di Guglielmo, Giuseppe
Carloni, Luca P.
2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
[3] Accelerating the computation of FLAPW methods on heterogeneous architectures
Davidovic, Davor
Fabregat-Traver, Diego
Hoehnerbach, Markus
Di Napoli, Edoardo
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (24):
[4] Accelerating Video Captioning on Heterogeneous System Architectures
Huang, Horng-Ruey
Hong, Ding-Yong
Wu, Jan-Jan
Chen, Kung-Fu
Liu, Pangfeng
Hsu, Wei-Chung
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)
[5] Shared Hardware Accelerator Architectures for Heterogeneous MPSoCs
Bouthaina, Damak
Baklouti, Mouna
Niar, Smail
Abid, Mohamed
2013 8TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2013,
[6] Big Data Analytics on Heterogeneous Accelerator Architectures
Neshatpour, Katayoun
Sasan, Avesta
Homayoun, Houman
2016 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2016,
[7] SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM
Gerogiannis, Gerasimos
Yesil, Serif
Lenadora, Damitha
Cao, Dingyuan
Mendis, Charith
Torrellas, Josep
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 260 - 274
[8] Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores
Chen, Linchuan
Huo, Xin
Agrawal, Gagan
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 48 - 57
[9] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
Cascon, Pablo
Ortiz, Andres
Ortega, Julio
Diaz, Antonio F.
Rojas, Ignacio
JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 302 - 313
[10] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
Pablo Cascón
Andrés Ortiz
Julio Ortega
Antonio F. Díaz
Ignacio Rojas
The Journal of Supercomputing, 2011, 58 : 302 - 313

← 1 2 3 4 5 →