SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

被引:9
|
作者
Pal, Subhankar [1 ]
Amarnath, Aporva [1 ]
Feng, Siying [1 ]
O'Boyle, Michael [2 ]
Dreslinski, Ronald [1 ]
Dubach, Christophe [3 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] McGill Univ, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会; 英国工程与自然科学研究理事会;
关键词
reconfigurable accelerators; sparse linear algebra; energy-efficient computing; machine learning; predictive models;
D O I
10.1145/3466752.3480134
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to workload phases. However, current adaptive approaches are oblivious to implicit phases that arise from operating on irregular data, such as sparse linear algebra operations. Implicit phases are short-lived and do not exhibit consistent behavior throughout execution. This calls for a high-accuracy, low overhead runtime mechanism for adaptation at a fine granularity. Moreover, adopting such techniques for reconfigurable manycore hardware, such as coarse-grained reconfigurable architectures (CGRAs), adds complexity due to synchronization and resource contention. We propose a lightweight machine learning-based adaptive framework called SparseAdapt. It enables low-overhead control of configuration parameters to tailor the hardware to both implicit (datadriven) and explicit (code-driven) phase changes. SparseAdapt is implemented within the runtime of a recently-proposed CGRA called Transmuter, which has been shown to deliver high performance for irregular sparse operations. SparseAdapt can adapt configuration parameters such as resource sharing, cache capacities, prefetcher aggressiveness, and dynamic voltage-frequency scaling (DVFS). Moreover, it can operate under the constraints of either (i) high energy-efficiency (maximal GFLOPS/W), or (ii) high power-performance (maximal GFLOPS3/W). We evaluate SparseAdapt with sparse matrix-matrix and matrix-vector multiplication (SpMSpM and SpMSpV) routines across a suite of uniform random, power-law and real-world matrices, in addition to end-to-end evaluation on two graph algorithms. SparseAdapt achieves similar performance on SpMSpM as the largest static configuration, with 5.3x better energy-efficiency. Furthermore, on both performance and efficiency, SparseAdapt is at most within 13% of an Oracle that adapts the configuration of each phase with global knowledge of the entire program execution. Finally, SparseAdapt is able to outperform the state-of-the-art approach for runtime reconfiguration by up to 2.9x in terms of energy-efficiency.
引用
收藏
页码:1005 / 1021
页数:17
相关论文
共 50 条
  • [31] Sparso: Context-driven Optimizations of Sparse Linear Algebra
    Rong, Hongbo
    Park, Jongsoo
    Xiang, Lingxiang
    Anderson, Todd A.
    Smelyanskiy, Mikhail
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 247 - 259
  • [32] PSBLAS: A library for parallel linear algebra computation on sparse matrices
    Filippone, S
    Colajanni, M
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2000, 26 (04): : 527 - 550
  • [33] Accelerating Sparse Linear Algebra Using Graphics Processing Units
    Spagnoli, Kyle E.
    Humphrey, John R.
    Price, Daniel K.
    Kelmelis, Eric J.
    MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VI, 2011, 8060
  • [34] A Novel Linear Sparse Array with Reconfigurable Pixel Antenna Elements
    Li, Ming
    Wei, Haiping
    Zhao, Jiahao
    Tao, Qingchang
    You, Zheng
    INTERNATIONAL JOURNAL OF ANTENNAS AND PROPAGATION, 2020, 2020
  • [35] Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers
    Stanisic, Luka
    Agullo, Emmanuel
    Buttari, Alfredo
    Guermouche, Abdou
    Legrand, Arnaud
    Lopez, Florent
    Videau, Brice
    2015 IEEE 21ST INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2015, : 481 - 490
  • [36] Performance Modeling Tools for Parallel Sparse Linear Algebra Computations
    Cicotti, Pietro
    Li, Xiaoye S.
    Baden, Scott B.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 83 - 90
  • [37] Reconfigurable Stream-Processing Architecture for Sparse Linear Solvers
    Cunningham, Kevin
    Nagvajara, Prawat
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2011, 6578 : 281 - 286
  • [38] Modelling. the runtime of the large and sparse linear system solver on parallel computers
    Yang, LTR
    DCABES 2002, PROCEEDING, 2002, : 175 - 179
  • [39] A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
    Zhao, Yunping
    Lu, Jianzhuang
    Chen, Xiaowen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 517 - 535
  • [40] CONTROL SYSTEM FOR STANFORD LINEAR ACCELERATOR
    MALLORY, KB
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1967, NS14 (03) : 1022 - &