iCELIA: A Full-Stack Framework for STT-MRAM-Based Deep Learning Acceleration

被引:14
|
作者
Yan, Hao [1 ]
Cherian, Hebin R. [2 ]
Ahn, Ethan C. [2 ]
Qian, Xuehai [3 ,4 ]
Duan, Lide [5 ]
机构
[1] Samsung, Austin, TX 78746 USA
[2] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA
[3] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA
[4] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[5] Alibaba DAMO Acad, Comp Technol Lab, Sunnyvale, CA 94085 USA
基金
美国国家科学基金会;
关键词
Deep learning; Nonvolatile memory; Computer architecture; Acceleration; Artificial neural networks; Resistance; Microprocessors; STT-MRAM; deep learning acceleration; processing-in-memory; device and architecture co-design; DRAM;
D O I
10.1109/TPDS.2019.2937517
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A large variety of applications rely on deep learning to process big data, learn sophisticated features, and perform complicated tasks. Utilizing emerging non-volatile memory (NVM)s unique characteristics, including the crossbar array structure and gray-scale cell resistances, to perform neural network (NN) computation is a well-studied approach in accelerating deep learning applications. Compared to other NVM technologies, STT-MRAM has its unique advantages in performing NN computation. However, the state-of-the-art research have not utilized STT-MRAM for deep learning acceleration due to its device- and architecture-level challenges. Consequently, this paper enables STT-MRAM, for the first time, as an effective and practical deep learning accelerator. In particular, it proposes a full-stack framework iCELIA spanning multiple design levels, including device-level fabrication, circuit-level enhancements, architecture-level synaptic weight quantization, and system-level accelerator design. The primary contributions of iCELIA over our prior work CELIA include a new non-uniform weight quantization scheme and much enhanced accelerator system design. The proposed framework significantly mitigates the model accuracy loss due to reduced data precision in a cohesive manner, constructing a comprehensive STT-MRAM accelerator system for fast NN computation with high energy efficiency and low cost.
引用
收藏
页码:408 / 422
页数:15
相关论文
共 50 条
  • [1] CELIA: A Device and Architecture Co-Design Framework for STT-MRAM-Based Deep Learning Acceleration
    Yan, Hao
    Cherian, Hebin R.
    Ahn, Ethan C.
    Duan, Lide
    INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 149 - 159
  • [2] STT-MRAM-Based Strong PUF Architecture
    Vatajelu, Elena Ioana
    Di Natale, Giorgio
    Torres, Lionel
    Prinetto, Paolo
    2015 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2015, : 467 - 472
  • [3] STT-MRAM-Based Reliable Weak PUF
    Hu, Yupeng
    Wu, Linjun
    Chen, Zhuojun
    Huang, Yun
    Xu, Xiaolin
    Li, Keqin
    Zhang, Jiliang
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1564 - 1574
  • [4] A full-stack platform for spiking deep learning
    Jie Pan
    Nature Computational Science, 2023, 3 : 913 - 913
  • [5] A full-stack platform for spiking deep learning
    Pan, Jie
    NATURE COMPUTATIONAL SCIENCE, 2023, 3 (11): : 913 - 913
  • [6] Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs
    Liu, Shuanglong
    Fan, Hongxiang
    Ferianc, Martin
    Niu, Xinyu
    Shi, Huifeng
    Luk, Wayne
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3974 - 3987
  • [7] STT-MRAM-Based Multicontext FPGA for Multithreading Computing Environment
    Kim, Jeongbin
    Song, Yongwoon
    Cho, Kyungseon
    Lee, Hyukjun
    Yoon, Hongil
    Chung, Eui-Young
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1330 - 1343
  • [8] Improving Reliability of STT-MRAM-Based Smart Material Implication
    Lanuzza, Marco
    Moposita, Tatiana
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO 2024, 2024, : 523 - 526
  • [9] CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (TinyML) Acceleration on FPGAs
    Prakash, Shvetank
    Callahan, Tim
    Bushagour, Joseph
    Banbury, Colby
    Green, Alan V.
    Warden, Pete
    Ansell, Tim
    Reddi, Vijay Janapa
    2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS, 2023, : 157 - 167
  • [10] A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
    Zhang, Dan
    Huda, Safeen
    Songhori, Ebrahim
    Prabhu, Kartik
    Quoc Le
    Goldie, Anna
    Mirhoseini, Azalia
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 27 - 42