Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

被引:1
|
作者
Pal, Subhankar [1 ]
Venkataramani, Swagath [2 ]
Srinivasan, Viji [2 ]
Gopalakrishnan, Kailash [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
PERFORMANCE;
D O I
10.1109/ISPASS51385.2021.00046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A prevalent challenge for Deep Learning (DL) accelerators is how they are programmed to sustain utilization without impacting end-user productivity. Little prior effort has been devoted to the effective management of their on-chip Scratch-Pad Memory (SPM) across the DL operations of a Deep Neural Network (DNN). This is especially critical due to trends in complex network topologies and the emergence of eager execution. This work demonstrates that there exists up to a 5.2x performance gap in DL inference to be bridged using SPM management, on a set of image, object and language networks. We propose OnSRAM, a novel SPM management framework integrated with a DL accelerator runtime. OnSRAM has two variants, viz. OnSRAM-Static, which works on static graphs to identify data structures that should be held on-chip based on their properties, and OnSRAM-Eager, which targets an eager execution model (no graph) and uses a speculative scheme to hold/discard data structures. On a prototypical DL accelerator, OnSRAM-Static and OnSRAM-Eager achieve reductions in inference latency (batch size of 1) of 1.02-4.8x and 1.02-3.1x, respectively, over a baseline with no SPM management.
引用
收藏
页码:240 / 242
页数:3
相关论文
共 50 条
  • [41] An integrated scratch-pad allocator for affine and non-affine code
    Udayakumaran, Sumesh
    Barua, Rajeev
    2006 DESIGN AUTOMATION AND TEST IN EUROPE, VOLS 1-3, PROCEEDINGS, 2006, : 923 - +
  • [42] Automatic Analysis of Scratch-Pad Memory Code for Heterogeneous Multicore Processors
    Donaldson, Alastair F.
    Kroening, Daniel
    Ruemmer, Philipp
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PROCEEDINGS, 2010, 6015 : 280 - 295
  • [43] Optimizing Data Distribution for Loops on Embedded Multicore with Scratch-Pad Memory
    Gao, Qiuyan
    Zhuge, Qingfeng
    Zhang, Jun
    Zhu, Guanyu
    Sha, Edwin H. -M.
    JOURNAL OF COMPUTERS, 2014, 9 (05) : 1066 - 1076
  • [44] Energy efficiency of scratch-pad memory at 65 nm and below: An empirical study
    Takase, Hideki
    Tomiyama, Hiroyuki
    Zeng, Gang
    Takada, Hiroaki
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2008, : 93 - 97
  • [45] Analysis of scratch-pad and data-cache performance using statistical methods
    Absar, Javed
    Catthoor, Francky
    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS, 2006, : 820 - 825
  • [46] ISOS: Space Overlapping Based on Iteration Access Patterns for Dynamic Scratch-pad Memory Management in Embedded Systems
    Yang, Yanqin
    Shao, Zili
    Pan, Linfeng
    Guo, Minyi
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1360 - +
  • [47] MCAMP: Communication Optimization on Massively Parallel Machines with Hierarchical Scratch-pad Memory
    Hayashizaki, Hiroshige
    Sugawara, Yutaka
    Inaba, Mary
    Hiraki, Kei
    PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 102 - 111
  • [48] The Energy Optimization for Architectures with Limited Addressing Modes Using Scratch-Pad Memory
    Ling Ming
    Zhang Yang
    Mei Chen
    Pu Hanlai
    CHINESE JOURNAL OF ELECTRONICS, 2010, 19 (04): : 637 - 640
  • [49] Scratch-Pad Memory Banking for Energy Reduction in Embedded Signal Processing Systems
    Balasa, Florin
    Luican, Ilie I.
    Gingu, Cristian V.
    2013 IEEE 56TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2013, : 844 - 847
  • [50] Pretenuring in Java']Java by object lifetime and reference density using scratch-pad memory
    Chong, K. F.
    Ho, C. Y.
    Fong, Anthony S.
    15TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2007, : 205 - +