Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

被引:1
|
作者
Pal, Subhankar [1 ]
Venkataramani, Swagath [2 ]
Srinivasan, Viji [2 ]
Gopalakrishnan, Kailash [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
PERFORMANCE;
D O I
10.1109/ISPASS51385.2021.00046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8x and 1.02-3.1x, respectively, over a baseline with no SPM management.
引用
收藏
页码:240 / 242
页数:3
相关论文
共 50 条
  • [31] Performance oriented allocation scheme for scratch-pad memory
    National ASIC System, Engineering Technology Research Center, Southeast University, Nanjing 210096, China
    Tien Tzu Hsueh Pao, 2007, 8 (1558-1562):
  • [32] Optimal Stack Frame Placement and Transfer for Energy Reduction Targeting Embedded Processors with Scratch-Pad Memories
    Gauthier, Lovic
    Ishihara, Tohru
    2009 IEEE/ACM/IFIP 7TH WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2009, : 116 - 125
  • [33] A decoupled architecture of processors with scratch-pad memory hierarchy
    Milidonis, A.
    Alachiotis, N.
    Porpodas, V.
    Michail, H.
    Kakarountas, A. P.
    Goutis, C. E.
    2007 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2007, : 612 - 617
  • [34] Anatomo-physiological scratch-pad of the cerebellospinal tract
    Lapicque, L
    COMPTES RENDUS DES SEANCES DE LA SOCIETE DE BIOLOGIE ET DE SES FILIALES, 1939, 131 : 851 - 854
  • [35] Exploiting scratch-pad memory using Presburger formulas
    Kandemir, M
    Kadayif, I
    Sezer, U
    ISSS'01: 14TH INTERNATIONAL SYMPOSIUM ON SYSTEM SYNTHESIS, 2001, : 7 - 12
  • [36] Compiler-assisted dynamic scratch-pad memory management with space overlapping for embedded systems
    Yang, Yanqin
    Yan, Haijin
    Shao, Zili
    Guo, Minyi
    SOFTWARE-PRACTICE & EXPERIENCE, 2011, 41 (07): : 737 - 752
  • [37] A scratch-pad memory aware dynamic loop scheduling algorithm
    Ozturk, Ozcan
    Kandemir, Mahmut
    Narayanan, Sri Hari Krishna
    ISQED 2008: PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, 2008, : 738 - +
  • [38] An alternative choice of scratch-pad memory for energy optimization in embedded system
    Ming, Ling
    Yu, Zhang
    Lin, Shen
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL, VOLS 1 AND 2, 2008, : 1641 - +
  • [39] Energy-Aware Scratch-Pad Memory Partitioning for Embedded Systems
    Balasa, Florin
    Abuaesh, Noha
    Gingu, Cristian V.
    Luican, Ilie I.
    Nasui, Doru V.
    PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2014), 2015, : 653 - +
  • [40] Scratch-pad memory allocation without compiler support for java applications
    Dept. of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
    ACM Special Interest Group on Design Automation; ACM Special Interest Group on Embedded Systems; ACM SIG on Microarchitectural Research and Processing, 1600, 85-94 (2007):