PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM

被引：47

作者：

Ankit, Aayush ^{[1
]}

El Hajj, Izzat ^{[2
]}

Chalamalasetti, Sai Rahul ^{[3
]}

Agarwal, Sapan ^{[4
]}

Marinella, Matthew ^{[4
]}

Foltin, Martin ^{[3
]}

Strachan, John Paul ^{[3
]}

Milojicic, Dejan ^{[3
]}

Hwu, Wen-Mei ^{[5
]}

Roy, Kaushik ^{[1
]}

机构：

[1] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Amer Univ Beirut, Dept Comp Sci, Beirut 11072020, Lebanon

[3] Hewlett Packard Labs, San Jose, CA 95002 USA

[4] Sandia Natl Labs, Livermore, CA 94550 USA

[5] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61820 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2020年 / 69卷 / 08期

关键词：

Accelerators; resistive random-access memory (ReRam); neural networks; training;

D O I：

10.1109/TC.2020.2998456

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to 8.02x, 54.21x, and 103x energy reductions as well as 7.16x, 4.02x, and 16x execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.

引用

页码：1128 / 1142

页数：15

共 50 条

[1] Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training
Sudarshan, Chirag
Sadi, Mohammad Hassani
Weis, Christian
Wehn, Norbert
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1472 - 1476
[2] An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network Accelerator
Zheng, Yang-Lin
Yang, Wei-Yi
Chen, Ya-Shu
Han, Ding-Hung
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (03) : 740 - 753
[3] Challenges in Energy-Efficient Deep Neural Network Training with FPGA
Tao, Yudong
Ma, Rui
Shyu, Mei-Ling
Chen, Shu-Ching
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 1602 - 1611
[4] Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture
de Moura, Rafael Fao
Carro, Luigi
[J]. APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2023, 2023, 14251 : 230 - 244
[5] An Energy-Efficient Systolic Pipeline Architecture for Binary Convolutional Neural Network
Liu, Baicheng
Chen, Song
Kang, Yi
Wu, Feng
[J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
[6] Hybrid Convolution Architecture for Energy-Efficient Deep Neural Network Processing
Kim, Suchang
Jo, Jihyuck
Park, In-Cheol
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (05) : 2017 - 2029
[7] Computational Storage for an Energy-Efficient Deep Neural Network Training System
Li, Shiju
Tang, Kevin
Lim, Jin
Lee, Chul-Ho
Kim, Jongryool
[J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 304 - 319
[8] A Lazy Engine for High-utilization and Energy-efficient ReRAM-based Neural Network Accelerator
Yang, Wei-Yi
Chen, Ya-Shu
Xiao, Jin-Wen
[J]. 2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 140 - 145
[9] Energy-Efficient Architecture for Neural Spikes Acquisition
Osipov, Dmitry
Paul, Steffen
Stemmann, Heiko
Kreiter, Andreas K.
[J]. 2018 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS): ADVANCED SYSTEMS FOR ENHANCING HUMAN HEALTH, 2018, : 439 - 442
[10] An Energy-Efficient Convolutional Neural Network Processor Architecture Based on a Systolic Array
Zhang, Chen
Wang, Xin'an
Yong, Shanshan
Zhang, Yining
Li, Qiuping
Wang, Chenyang
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (24):

← 1 2 3 4 5 →