SPARCE: Sparsity Aware General-Purpose Core Extensions to Accelerate Deep Neural Networks

被引:20
|
作者
Sen, Sanchari [1 ]
Jain, Shubham [1 ]
Venkataramani, Swagath [2 ]
Raghunathan, Anand [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47906 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
基金
美国国家科学基金会;
关键词
Deep learning; deep neural networks; sparsity; general purpose processors;
D O I
10.1109/TC.2018.2879434
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demand posed by DNNs is a key challenge for computing system designers and has most commonly been addressed through the design of DNN accelerators. However, these specialized accelerators utilize large quantities of multiply-accumulate units and on-chip memory and are prohibitive in area and cost constrained systems such as wearable devices and IoT sensors. In this work, we take a complementary approach and improve the performance of DNNs on general-purpose processor (GPP) cores. We do so by exploiting a key attribute of DNNs, viz. sparsity or the prevalence of zero values. We propose Sparsity-aware Core Extensions (SPARCE)-a set of low-overhead micro-architectural and ISA extensions that dynamically detect whether an operand (e. g., the result of a load instruction) is zero and subsequently skip a set of future instructions that use it. To maximize performance benefits, SPARCE ensures that the instructions to be skipped are prevented fromeven being fetched, as squashing instructions comeswith a penalty (e. g., a pipeline stall). SPARCE consists of 2 keymicro-architectural enhancements. First, a Sparsity Register File (SpRF) is utilized to track registers that are zero. Next, a Sparsity-Aware Skip Address (SASA) Table is used to indicate instruction sequences that can be skipped, and to specify conditions on SpRF registers that trigger instruction skipping. When an instruction is fetched, SPARCE dynamically pre-identifies whether the following instruction(s) can be skipped, and if so appropriatelymodifies the program counter, thereby skipping the redundant instructions and improving performance. We model SPARCE using the gem5 architectural simulator, and evaluate our approach on 6 state-of-the-art image-recognition DNNs in the context of both training and inference using the Caffe deep learning framework. On a scalar microprocessor, SPARCE achieves 1.11x-1.96x speedups across both convolution and fully-connected layers that exhibit 10-90 percent sparsity. These speedups translate to 19-31 percent reduction in execution time at the overall application-level. We also evaluate SPARCE on a 4-way SIMD ARMv8 processor using the OpenBLAS library, and demonstrate that SPARCE achieves 8-15 percent reduction in the application-level execution time.
引用
收藏
页码:912 / 925
页数:14
相关论文
共 50 条
  • [31] A GENERAL-PURPOSE DIGITAL ARCHITECTURE FOR NEURAL NETWORK SIMULATIONS
    DURANTON, M
    MAUDUIT, N
    [J]. FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 62 - 66
  • [32] A Syntactic Neural Model for General-Purpose Code Generation
    Yin, Pengcheng
    Neubig, Graham
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 440 - 450
  • [33] TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
    Mahmoud, Mostafa
    Edo, Isak
    Zadeh, Ali Hadi
    Awad, Omar Mohamed
    Pekhimenko, Gennady
    Albericio, Jorge
    Moshovos, Andreas
    [J]. 2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 781 - 795
  • [34] Addressing Sparsity in Deep Neural Networks
    Zhou, Xuda
    Du, Zidong
    Zhang, Shijin
    Zhang, Lei
    Lan, Huiying
    Liu, Shaoli
    Li, Ling
    Guo, Qi
    Chen, Tianshi
    Chen, Yunji
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (10) : 1858 - 1871
  • [35] A general-purpose communication networks simulation system - GPCNSS
    Cao, Y.
    Zhang, W.
    Sha, J.
    Xu, L.
    [J]. Xitong Fangzhen Xuebao / Journal of System Simulation, 2001, 13 (03): : 353 - 356
  • [36] Limas to high-speed simulations of spiking neural networks using general-purpose computers
    Zenke, Friedemann
    Gerstner, Wulfram
    [J]. FRONTIERS IN NEUROINFORMATICS, 2014, 8 : 1 - 15
  • [37] Performance of image and video processing with general-purpose processors and media ISA extensions
    Ranganathan, P
    Adve, S
    Jouppi, NP
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 1999, : 124 - 135
  • [38] Performance of an advanced video codec on a general-purpose processor with media ISA extensions
    Lappalainen, V
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2000, 46 (03) : 706 - 716
  • [39] Analysis and comparison of general-purpose operating system real-time extensions
    Gao, Xiaopeng
    Long, Xiang
    [J]. Jisuanji Gongcheng/Computer Engineering, 2003, 29 (06):
  • [40] SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors
    Gong, Zhangxiaowen
    Ji, Houxiang
    Fletcher, Christopher W.
    Hughes, Christopher J.
    Torrellas, Josep
    [J]. PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 279 - 292