PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM

被引:47
|
作者
Ankit, Aayush [1 ]
El Hajj, Izzat [2 ]
Chalamalasetti, Sai Rahul [3 ]
Agarwal, Sapan [4 ]
Marinella, Matthew [4 ]
Foltin, Martin [3 ]
Strachan, John Paul [3 ]
Milojicic, Dejan [3 ]
Hwu, Wen-Mei [5 ]
Roy, Kaushik [1 ]
机构
[1] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Amer Univ Beirut, Dept Comp Sci, Beirut 11072020, Lebanon
[3] Hewlett Packard Labs, San Jose, CA 95002 USA
[4] Sandia Natl Labs, Livermore, CA 94550 USA
[5] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61820 USA
关键词
Accelerators; resistive random-access memory (ReRam); neural networks; training;
D O I
10.1109/TC.2020.2998456
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to 8.02x, 54.21x, and 103x energy reductions as well as 7.16x, 4.02x, and 16x execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.
引用
收藏
页码:1128 / 1142
页数:15
相关论文
共 50 条
  • [1] Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training
    Sudarshan, Chirag
    Sadi, Mohammad Hassani
    Weis, Christian
    Wehn, Norbert
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1472 - 1476
  • [2] An Energy-Efficient Inference Engine for a Configurable ReRAM-Based Neural Network Accelerator
    Zheng, Yang-Lin
    Yang, Wei-Yi
    Chen, Ya-Shu
    Han, Ding-Hung
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (03) : 740 - 753
  • [3] Challenges in Energy-Efficient Deep Neural Network Training with FPGA
    Tao, Yudong
    Ma, Rui
    Shyu, Mei-Ling
    Chen, Shu-Ching
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 1602 - 1611
  • [4] Scalable and Energy-Efficient NN Acceleration with GPU-ReRAM Architecture
    de Moura, Rafael Fao
    Carro, Luigi
    [J]. APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2023, 2023, 14251 : 230 - 244
  • [5] An Energy-Efficient Systolic Pipeline Architecture for Binary Convolutional Neural Network
    Liu, Baicheng
    Chen, Song
    Kang, Yi
    Wu, Feng
    [J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
  • [6] Hybrid Convolution Architecture for Energy-Efficient Deep Neural Network Processing
    Kim, Suchang
    Jo, Jihyuck
    Park, In-Cheol
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (05) : 2017 - 2029
  • [7] Computational Storage for an Energy-Efficient Deep Neural Network Training System
    Li, Shiju
    Tang, Kevin
    Lim, Jin
    Lee, Chul-Ho
    Kim, Jongryool
    [J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 304 - 319
  • [8] A Lazy Engine for High-utilization and Energy-efficient ReRAM-based Neural Network Accelerator
    Yang, Wei-Yi
    Chen, Ya-Shu
    Xiao, Jin-Wen
    [J]. 2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 140 - 145
  • [9] Energy-Efficient Architecture for Neural Spikes Acquisition
    Osipov, Dmitry
    Paul, Steffen
    Stemmann, Heiko
    Kreiter, Andreas K.
    [J]. 2018 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS): ADVANCED SYSTEMS FOR ENHANCING HUMAN HEALTH, 2018, : 439 - 442
  • [10] An Energy-Efficient Convolutional Neural Network Processor Architecture Based on a Systolic Array
    Zhang, Chen
    Wang, Xin'an
    Yong, Shanshan
    Zhang, Yining
    Li, Qiuping
    Wang, Chenyang
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (24):