HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

被引:0
|
作者
Gerogiannis, Gerasimos [1 ,2 ]
Aananthakrishnan, Sriram [1 ]
Torrellas, Josep [2 ]
Hur, Ibrahim [1 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
PERFORMANCE; ACCURATE;
D O I
10.1109/HPCA57654.2024.00081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
引用
收藏
页码:1012 / 1028
页数:17
相关论文
共 50 条
  • [1] Accelerating SLIDE: Exploiting Sparsity on Accelerator Architectures
    Ko, Sho
    Rucker, Alexander
    Zhang, Yaqi
    Mure, Paul
    Olukotun, Kunle
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 663 - 670
  • [2] An Analysis of Accelerator Coupling in Heterogeneous Architectures
    Cota, Emilio G.
    Mantovani, Paolo
    Di Guglielmo, Giuseppe
    Carloni, Luca P.
    2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
  • [3] Accelerating the computation of FLAPW methods on heterogeneous architectures
    Davidovic, Davor
    Fabregat-Traver, Diego
    Hoehnerbach, Markus
    Di Napoli, Edoardo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (24):
  • [4] Accelerating Video Captioning on Heterogeneous System Architectures
    Huang, Horng-Ruey
    Hong, Ding-Yong
    Wu, Jan-Jan
    Chen, Kung-Fu
    Liu, Pangfeng
    Hsu, Wei-Chung
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)
  • [5] Shared Hardware Accelerator Architectures for Heterogeneous MPSoCs
    Bouthaina, Damak
    Baklouti, Mouna
    Niar, Smail
    Abid, Mohamed
    2013 8TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2013,
  • [6] Big Data Analytics on Heterogeneous Accelerator Architectures
    Neshatpour, Katayoun
    Sasan, Avesta
    Homayoun, Houman
    2016 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2016,
  • [7] SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM
    Gerogiannis, Gerasimos
    Yesil, Serif
    Lenadora, Damitha
    Cao, Dingyuan
    Mendis, Charith
    Torrellas, Josep
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 260 - 274
  • [8] Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores
    Chen, Linchuan
    Huo, Xin
    Agrawal, Gagan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 48 - 57
  • [9] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
    Cascon, Pablo
    Ortiz, Andres
    Ortega, Julio
    Diaz, Antonio F.
    Rojas, Ignacio
    JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 302 - 313
  • [10] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
    Pablo Cascón
    Andrés Ortiz
    Julio Ortega
    Antonio F. Díaz
    Ignacio Rojas
    The Journal of Supercomputing, 2011, 58 : 302 - 313