HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

被引:0
|
作者
Gerogiannis, Gerasimos [1 ,2 ]
Aananthakrishnan, Sriram [1 ]
Torrellas, Josep [2 ]
Hur, Ibrahim [1 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
PERFORMANCE; ACCURATE;
D O I
10.1109/HPCA57654.2024.00081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
引用
收藏
页码:1012 / 1028
页数:17
相关论文
共 50 条
  • [41] MachSuite: Benchmarks for Accelerator Design and Customized Architectures
    Reagen, Brandon
    Adolf, Robert
    Shao, Yakun Sophia
    Wei, Gu-Yeon
    Brooks, David
    2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 110 - 119
  • [42] A Survey of Accelerator Architectures for Deep Neural Networks
    Chen, Yiran
    Xie, Yuan
    Song, Linghao
    Chen, Fan
    Tang, Tianqi
    ENGINEERING, 2020, 6 (03) : 264 - 274
  • [43] Tradeoffs in Designing Accelerator Architectures for Visual Computing
    Mahesri, Aqeel
    Johnson, Daniel
    Crago, Neal
    Patel, Sanjay J.
    2008 PROCEEDINGS OF THE 41ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-41, 2008, : 164 - 175
  • [44] Accelerator-Rich Architectures: Opportunities and Progresses
    Cong, Jason
    Ghodrat, Mohammad Ali
    Gill, Michael
    Grigorian, Beayna
    Gururaj, Karthik
    Reinman, Glenn
    2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
  • [45] Parallel Programming for Heterogeneous Architectures
    Krammer, Bettina
    Mix, Hartmut
    Geimer, Markus
    PARALLEL COMPUTING: ACCELERATING COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, 25 : 731 - 732
  • [46] Heterogeneous cluster architectures and applications
    Silla, Federico
    Froening, Holger
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (08): : 2319 - 2321
  • [47] Exploration of Heterogeneous FPGA Architectures
    Farooq, Umer
    Parvez, Husain
    Mehrez, Habib
    Marrakchi, Zied
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2011, 2011
  • [48] Performance evaluation of heterogeneous architectures
    Ramos-Hernandez, DN
    Tokhi, MO
    ALGORITHMS AND ARCHITECTURES FOR REAL-TIME CONTROL 1998 (AARTC'98), 1998, : 173 - 178
  • [49] Portable Performance on Heterogeneous Architectures
    Phothilimthana, Phitchaya Mangpo
    Ansel, Jason
    Ragan-Kelley, Jonathan
    Amarasinghe, Saman
    ACM SIGPLAN NOTICES, 2013, 48 (04) : 431 - 443
  • [50] Portable checkpointing for heterogeneous architectures
    Ramkumar, B
    Strumpen, V
    TWENTY-SEVENTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1997, : 58 - 67