Enabling High-Efficient ReRAM-Based CNN Training Via Exploiting Crossbar-Level Insignificant Writing Elimination

被引：1

作者：

Wang, Lening ^{[1
]}

Wan, Qiyu ^{[1
]}

Ma, Peixun ^{[2
]}

Wang, Jing ^{[3
]}

Chen, Minsong ^{[4
]}

Song, Shuaiwen Leon ^{[5
]}

Fu, Xin ^{[6
]}

机构：

[1] Univ Houston, Dept Elect Engn, Houston, TX 77204 USA

[2] Hunan Polytech Environm & Biol, Ecol Livable Coll, Hengyang 421200, Hunan, Peoples R China

[3] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China

[4] East China Normal Univ, Software Engn Inst, Shanghai 200062, Peoples R China

[5] Univ Sydney, Sch Comp Sci, Sydney, NSW 2008, Australia

[6] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77204 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 11期

关键词：

CNN; ReRAM; training; PERFORMANCE; ENERGY; NOISE;

D O I：

10.1109/TC.2023.3288763

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convolutional neural networks (CNNs) have been widely adopted in many deep learning applications. However, training a deep CNN requests intensive data transfer, which is both time and energy consuming. Using resistive random-access memory (ReRAM) to process data locally in memory is an emerging solution to eliminate the massive data movement. However, training cannot be efficiently supported with current ReRAM-based PIM accelerators because of the frequent and high-cost ReRAM writing operations from the delay, energy, and ReRAM lifetime perspectives. In this paper, we observe that activation induced and weight updating induced writing operations dominate the training energy on ReRAM-based accelerators. We then exploit and leverage a new angle in intermediate data (e.g., activations and errors) sparsity that fits the unique computation pattern in ReRAM crossbars to effectively eliminate the insignificant ReRAM writings, thus, enabling highly efficient CNN training without hurting the training accuracy. The experiment results show our proposed scheme achieves averagely 4.97 x (19.23 x) energy saving and 1.38 x (30.08 x) speedup compared to the state-of-the-art ReRAM-based accelerator (GPU). Our scheme also achieves 4.55 x lifetime enhancement compared to the state-of-the-art ReRAM accelerator.

引用

页码：3218 / 3230

页数：13

共 3 条

[1] Enabling Highly-Efficient DNA Sequence Mapping via ReRAM-based TCAM
Lai, Yu-Shao
Chen, Shuo-Han
Chang, Yuan-Hao
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[2] Energy-Efficient ReRAM-based ML Training via Mixed Pruning and Reconfigurable ADC
Ogbogu, Chukwufumnanya
Soumen, Mohapatra
Joardar, Biresh Kumar
Doppa, Janardhan Rao
Heo, Deuk
Chakrabarty, Krishnendu
Pande, Partha Pratim
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[3] High-Throughput Training of Deep CNNs on ReRAM-Based Heterogeneous Architectures via Optimized Normalization Layers
Joardar, Biresh Kumar
Deshwal, Aryan
Doppa, Janardhan Rao
Pande, Partha Pratim
Chakrabarty, Krishnendu
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1537 - 1549

← 1 →