HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

被引：0

作者：

Gerogiannis, Gerasimos ^{[1
,2
]}

Aananthakrishnan, Sriram ^{[1
]}

Torrellas, Josep ^{[2
]}

Hur, Ibrahim ^{[1
]}

机构：

[1] Intel Corp, Santa Clara, CA 95051 USA

[2] Univ Illinois, Urbana, IL 61801 USA

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024 | 2024年

关键词：

PERFORMANCE; ACCURATE;

D O I：

10.1109/HPCA57654.2024.00081

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.

引用

页码：1012 / 1028

页数：17

共 50 条

[21] Simultaneous and Heterogenous Multithreading: Exploiting Simultaneous and Heterogeneous Parallelism in Accelerator-Rich Architectures
Hsu, Kuan-Chieh
Tseng, Hung-Wei
IEEE MICRO, 2024, 44 (04) : 11 - 19
[22] Hybrid Network-on-Chip Architectures for Accelerating Deep Learning Kernels on Heterogeneous Manycore Platforms
Choi, Wonje
Duraisamy, Karthi
Kim, Ryan Gary
Doppa, Janardhan Rao
Pande, Partha Pratim
Marculescu, Radu
Marculescu, Diana
2016 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES), 2016,
[23] Accelerating SIFT on Parallel Architectures
Warn, Seth
Emeneker, Wesley
Cothren, Jackson
Apon, Amy
2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 577 - +
[24] Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Yang, Lei
Yan, Zheyu
Li, Meng
Kwon, Hyoukjun
Lai, Liangzhen
Krishna, Tushar
Chandra, Vikas
Jiang, Weiwen
Shi, Yiyu
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[25] Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures
Valero-Lara, Pedro
Igual, Francisco D.
Prieto-Matias, Manuel
Pinelli, Alfredo
Favier, Julien
JOURNAL OF COMPUTATIONAL SCIENCE, 2015, 10 : 249 - 261
[26] ACCELERATING FIELD OF A LINEAR INDUCTION ACCELERATOR
VAKHRUSH.YP
SEMENOV, OV
SOVIET PHYSICS TECHNICAL PHYSICS-USSR, 1969, 13 (09): : 1238 - &
[27] ACCELERATOR AND NEW ACCELERATING SCHEMES - FOREWORD
HEUSCH, B
MATTHIEUSSENT, G
REVUE DE PHYSIQUE APPLIQUEE, 1988, 23 (09): : 1417 - 1421
[28] Accelerating Geospatial Applications on Hybrid Architectures
Lai, Chenggang
Huang, Miaoqing
Shi, Xuan
You, Haihang
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1545 - 1552
[29] Accelerating SMO algorithm on parallel architectures
Lipowski, J
Jankowski, S
PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS IV, 2006, 6159
[30] Re-compact: Structured Pruning and SpMM Kernel Co-design for Accelerating DNNs on GPUs
Zhang, Yuling
Ren, Ao
Chen, Xianzhang
Lin, Qiu
Tan, Yujuan
Liu, Duo
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 399 - 406

← 1 2 3 4 5 →