Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training

被引:1
|
作者
Han, Wontak [1 ]
Heo, Jaehoon [1 ]
Kim, Junsoo [1 ]
Lim, Sukbin [1 ]
Kim, Joo-Young [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 34141, South Korea
关键词
Training; Computational modeling; Computer architecture; Deep learning; Circuits and systems; Power demand; Neurons; Accelerator architecture; machine learning; processing-in-memory architecture; bit-serial operation; inference; training; sparsity handling; SRAM; energy-efficient architecture; DEEP NEURAL-NETWORKS; SRAM; ACCELERATOR; MACRO;
D O I
10.1109/JETCAS.2022.3168852
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As machine learning (ML) and artificial intelligence (AI) have become mainstream technologies, many accelerators have been proposed to cope with their computation kernels. However, they access the external memory frequently due to the large size of deep neural network model, suffering from the von Neumann bottleneck. Moreover, as privacy issue is becoming more critical, on-device training is emerging as its solution. However, on-device training is challenging because it should perform the training under a limited power budget, which requires a lot more computations and memory accesses than the inference. In this paper, we present an energy-efficient processing-in-memory (PIM) architecture supporting end-to-end on-device training named T-PIM. Its macro design includes an 8T-SRAM cell-based PIM block to compute in-memory AND operation and three computational datapaths for end-to-end training. Each of three computational paths integrates arithmetic units for forward propagation, backward propagation, and gradient calculation and weight update, respectively, allowing the weight data stored in the memory stationary. T-PIM also supports variable bit precision to cover various ML scenarios. It can use fully variable input bit precision and 2-bit, 4-bit, 8-bit, and 16-bit weight bit precision for the forward propagation and the same input bit precision and 16-bit weight bit precision for the backward propagation. In addition, T-PIM implements sparsity handling schemes that skip the computation for input data and turn off the arithmetic units for weight data to reduce both unnecessary computations and leakage power. Finally, we fabricate the T-PIM chip on a 5.04mm(2) die in a 28-nm CMOS logic process. It operates at 50-280MHz with the supply voltage of 0.75-1.05V, dissipating 5.25-51.23mW power in inference and 6.10-37.75mW in training. As a result, it achieves 17.90-161.08TOPS/W energy efficiency for the inference of 1-bit activation and 2-bit weight data, and 0.84-7.59TOPS/W for the training of 8-bit activation/error and 16-bit weight data. In conclusion, T-PIM is the first PIM chip that supports end-to-end training, demonstrating 2.02 times performance improvement over the latest PIM that partially supports training.
引用
收藏
页码:354 / 366
页数:13
相关论文
共 50 条
  • [1] Processing-in-Memory for Energy-efficient Neural Network Training: A Heterogeneous Approach
    Liu, Jiawen
    Zhao, Hengyu
    Ogleari, Matheus Almeida
    Li, Dong
    Zhao, Jishen
    2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 655 - 668
  • [2] An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators
    Sun, Hanbo
    Zhu, Zhenhua
    Cai, Yi
    Chen, Xiaoming
    Wang, Yu
    Yang, Huazhong
    2020 25TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2020, 2020, : 325 - 330
  • [3] An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity
    Huang, Jian
    Lu, Jinming
    Wang, Zhongfeng
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2802 - 2805
  • [4] Z-PIM: An Energy-Efficient Sparsity Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision
    Kim, Ji-Hoon
    Lee, Juhyoung
    Lee, Jinsu
    Yoo, Hoi-Jun
    Kim, Joo-Young
    2020 IEEE SYMPOSIUM ON VLSI CIRCUITS, 2020,
  • [5] ReverSearch: Search-based energy-efficient Processing-in-Memory Architecture
    Li, Weihang
    Chang, Liang
    Fan, Jiajing
    Zhao, Xin
    Zhang, Hengtan
    Lin, Shuisheng
    Zhou, Jun
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 409 - 413
  • [6] T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training
    Heo, Jaehoon
    Kim, Junsoo
    Lim, Sukbin
    Han, Wontak
    Kim, Joo-Young
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (03) : 600 - 613
  • [7] Parasitic-Aware Modeling and Neural Network Training Scheme for Energy-Efficient Processing-in-Memory With Resistive Crossbar Array
    Cao, Tiancheng
    Liu, Chen
    Gao, Yuan
    Goh, Wang Ling
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 436 - 444
  • [8] An Energy-efficient Processing-in-memory Architecture for Long Short Term Memory in Spin Orbit Torque MRAM
    Kim, Kyeonghan
    Shin, Hyein
    Sim, Jaehyeong
    Kang, Myeonggu
    Kim, Lee-Sup
    2019 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2019,
  • [9] RIME: A Scalable and Energy-Efficient Processing-In-Memory Architecture for Floating-Point Operations
    Lu, Zhaojun
    Arafin, Md Tanvir
    Qu, Gang
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 120 - 125
  • [10] GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation
    Kang, Sanghoon
    Han, Donghyeon
    Lee, Juhyoung
    Im, Dongseok
    Kim, Sangyeob
    Kim, Soyeon
    Ryu, Junha
    Yoo, Hoi-Jun
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (09) : 2845 - 2857