HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

被引:0
|
作者
Gerogiannis, Gerasimos [1 ,2 ]
Aananthakrishnan, Sriram [1 ]
Torrellas, Josep [2 ]
Hur, Ibrahim [1 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
[2] Univ Illinois, Urbana, IL 61801 USA
关键词
PERFORMANCE; ACCURATE;
D O I
10.1109/HPCA57654.2024.00081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
引用
收藏
页码:1012 / 1028
页数:17
相关论文
共 50 条
  • [21] Simultaneous and Heterogenous Multithreading: Exploiting Simultaneous and Heterogeneous Parallelism in Accelerator-Rich Architectures
    Hsu, Kuan-Chieh
    Tseng, Hung-Wei
    IEEE MICRO, 2024, 44 (04) : 11 - 19
  • [22] Hybrid Network-on-Chip Architectures for Accelerating Deep Learning Kernels on Heterogeneous Manycore Platforms
    Choi, Wonje
    Duraisamy, Karthi
    Kim, Ryan Gary
    Doppa, Janardhan Rao
    Pande, Partha Pratim
    Marculescu, Radu
    Marculescu, Diana
    2016 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES), 2016,
  • [23] Accelerating SIFT on Parallel Architectures
    Warn, Seth
    Emeneker, Wesley
    Cothren, Jackson
    Apon, Amy
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 577 - +
  • [24] Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
    Yang, Lei
    Yan, Zheyu
    Li, Meng
    Kwon, Hyoukjun
    Lai, Liangzhen
    Krishna, Tushar
    Chandra, Vikas
    Jiang, Weiwen
    Shi, Yiyu
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [25] Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures
    Valero-Lara, Pedro
    Igual, Francisco D.
    Prieto-Matias, Manuel
    Pinelli, Alfredo
    Favier, Julien
    JOURNAL OF COMPUTATIONAL SCIENCE, 2015, 10 : 249 - 261
  • [26] ACCELERATING FIELD OF A LINEAR INDUCTION ACCELERATOR
    VAKHRUSH.YP
    SEMENOV, OV
    SOVIET PHYSICS TECHNICAL PHYSICS-USSR, 1969, 13 (09): : 1238 - &
  • [27] ACCELERATOR AND NEW ACCELERATING SCHEMES - FOREWORD
    HEUSCH, B
    MATTHIEUSSENT, G
    REVUE DE PHYSIQUE APPLIQUEE, 1988, 23 (09): : 1417 - 1421
  • [28] Accelerating Geospatial Applications on Hybrid Architectures
    Lai, Chenggang
    Huang, Miaoqing
    Shi, Xuan
    You, Haihang
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1545 - 1552
  • [29] Accelerating SMO algorithm on parallel architectures
    Lipowski, J
    Jankowski, S
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS IV, 2006, 6159
  • [30] Re-compact: Structured Pruning and SpMM Kernel Co-design for Accelerating DNNs on GPUs
    Zhang, Yuling
    Ren, Ao
    Chen, Xianzhang
    Lin, Qiu
    Tan, Yujuan
    Liu, Duo
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 399 - 406