iCELIA: A Full-Stack Framework for STT-MRAM-Based Deep Learning Acceleration

被引：14

作者：

Yan, Hao ^{[1
]}

Cherian, Hebin R. ^{[2
]}

Ahn, Ethan C. ^{[2
]}

Qian, Xuehai ^{[3
,4
]}

Duan, Lide ^{[5
]}

机构：

[1] Samsung, Austin, TX 78746 USA

[2] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA

[3] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA

[4] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90089 USA

[5] Alibaba DAMO Acad, Comp Technol Lab, Sunnyvale, CA 94085 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2020年 / 31卷 / 02期

基金：

美国国家科学基金会;

关键词：

Deep learning; Nonvolatile memory; Computer architecture; Acceleration; Artificial neural networks; Resistance; Microprocessors; STT-MRAM; deep learning acceleration; processing-in-memory; device and architecture co-design; DRAM;

D O I：

10.1109/TPDS.2019.2937517

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A large variety of applications rely on deep learning to process big data, learn sophisticated features, and perform complicated tasks. Utilizing emerging non-volatile memory (NVM)s unique characteristics, including the crossbar array structure and gray-scale cell resistances, to perform neural network (NN) computation is a well-studied approach in accelerating deep learning applications. Compared to other NVM technologies, STT-MRAM has its unique advantages in performing NN computation. However, the state-of-the-art research have not utilized STT-MRAM for deep learning acceleration due to its device- and architecture-level challenges. Consequently, this paper enables STT-MRAM, for the first time, as an effective and practical deep learning accelerator. In particular, it proposes a full-stack framework iCELIA spanning multiple design levels, including device-level fabrication, circuit-level enhancements, architecture-level synaptic weight quantization, and system-level accelerator design. The primary contributions of iCELIA over our prior work CELIA include a new non-uniform weight quantization scheme and much enhanced accelerator system design. The proposed framework significantly mitigates the model accuracy loss due to reduced data precision in a cohesive manner, constructing a comprehensive STT-MRAM accelerator system for fast NN computation with high energy efficiency and low cost.

引用

页码：408 / 422

页数：15

共 50 条

[1] CELIA: A Device and Architecture Co-Design Framework for STT-MRAM-Based Deep Learning Acceleration
Yan, Hao
Cherian, Hebin R.
Ahn, Ethan C.
Duan, Lide
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 149 - 159
[2] STT-MRAM-Based Strong PUF Architecture
Vatajelu, Elena Ioana
Di Natale, Giorgio
Torres, Lionel
Prinetto, Paolo
2015 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2015, : 467 - 472
[3] STT-MRAM-Based Reliable Weak PUF
Hu, Yupeng
Wu, Linjun
Chen, Zhuojun
Huang, Yun
Xu, Xiaolin
Li, Keqin
Zhang, Jiliang
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1564 - 1574
[4] A full-stack platform for spiking deep learning
Jie Pan
Nature Computational Science, 2023, 3 : 913 - 913
[5] A full-stack platform for spiking deep learning
Pan, Jie
NATURE COMPUTATIONAL SCIENCE, 2023, 3 (11): : 913 - 913
[6] Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs
Liu, Shuanglong
Fan, Hongxiang
Ferianc, Martin
Niu, Xinyu
Shi, Huifeng
Luk, Wayne
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3974 - 3987
[7] STT-MRAM-Based Multicontext FPGA for Multithreading Computing Environment
Kim, Jeongbin
Song, Yongwoon
Cho, Kyungseon
Lee, Hyukjun
Yoon, Hongil
Chung, Eui-Young
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1330 - 1343
[8] Improving Reliability of STT-MRAM-Based Smart Material Implication
Lanuzza, Marco
Moposita, Tatiana
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO 2024, 2024, : 523 - 526
[9] CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (TinyML) Acceleration on FPGAs
Prakash, Shvetank
Callahan, Tim
Bushagour, Joseph
Banbury, Colby
Green, Alan V.
Warden, Pete
Ansell, Tim
Reddi, Vijay Janapa
2023 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, ISPASS, 2023, : 157 - 167
[10] A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
Zhang, Dan
Huda, Safeen
Songhori, Ebrahim
Prabhu, Kartik
Quoc Le
Goldie, Anna
Mirhoseini, Azalia
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 27 - 42

← 1 2 3 4 5 →