Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training

被引:1
|
作者
Han, Wontak [1 ]
Heo, Jaehoon [1 ]
Kim, Junsoo [1 ]
Lim, Sukbin [1 ]
Kim, Joo-Young [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 34141, South Korea
关键词
Training; Computational modeling; Computer architecture; Deep learning; Circuits and systems; Power demand; Neurons; Accelerator architecture; machine learning; processing-in-memory architecture; bit-serial operation; inference; training; sparsity handling; SRAM; energy-efficient architecture; DEEP NEURAL-NETWORKS; SRAM; ACCELERATOR; MACRO;
D O I
10.1109/JETCAS.2022.3168852
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As machine learning (ML) and artificial intelligence (AI) have become mainstream technologies, many accelerators have been proposed to cope with their computation kernels. However, they access the external memory frequently due to the large size of deep neural network model, suffering from the von Neumann bottleneck. Moreover, as privacy issue is becoming more critical, on-device training is emerging as its solution. However, on-device training is challenging because it should perform the training under a limited power budget, which requires a lot more computations and memory accesses than the inference. In this paper, we present an energy-efficient processing-in-memory (PIM) architecture supporting end-to-end on-device training named T-PIM. Its macro design includes an 8T-SRAM cell-based PIM block to compute in-memory AND operation and three computational datapaths for end-to-end training. Each of three computational paths integrates arithmetic units for forward propagation, backward propagation, and gradient calculation and weight update, respectively, allowing the weight data stored in the memory stationary. T-PIM also supports variable bit precision to cover various ML scenarios. It can use fully variable input bit precision and 2-bit, 4-bit, 8-bit, and 16-bit weight bit precision for the forward propagation and the same input bit precision and 16-bit weight bit precision for the backward propagation. In addition, T-PIM implements sparsity handling schemes that skip the computation for input data and turn off the arithmetic units for weight data to reduce both unnecessary computations and leakage power. Finally, we fabricate the T-PIM chip on a 5.04mm(2) die in a 28-nm CMOS logic process. It operates at 50-280MHz with the supply voltage of 0.75-1.05V, dissipating 5.25-51.23mW power in inference and 6.10-37.75mW in training. As a result, it achieves 17.90-161.08TOPS/W energy efficiency for the inference of 1-bit activation and 2-bit weight data, and 0.84-7.59TOPS/W for the training of 8-bit activation/error and 16-bit weight data. In conclusion, T-PIM is the first PIM chip that supports end-to-end training, demonstrating 2.02 times performance improvement over the latest PIM that partially supports training.
引用
收藏
页码:354 / 366
页数:13
相关论文
共 50 条
  • [21] Processing-in-memory in High Bandwidth Memory (PIM-HBM) Architecture with Energy-efficient and Low Latency Channels for High Bandwidth System
    Kim, Seongguk
    Kim, Subin
    Cho, Kyungjun
    Shin, Taein
    Park, Hyunwook
    Lho, Daehwan
    Park, Shinyoung
    Son, Kyungjune
    Park, Gapyeol
    Kim, Joungho
    2019 IEEE 28TH CONFERENCE ON ELECTRICAL PERFORMANCE OF ELECTRONIC PACKAGING AND SYSTEMS (EPEPS 2019), 2019,
  • [22] PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference
    Zhang, Bo
    Yin, Shihui
    Kim, Minkyu
    Saikia, Jyotishman
    Kwon, Soonwan
    Myung, Sungmeen
    Kim, Hyunsoo
    Kim, Sang Joon
    Seo, Jae-Sun
    Seok, Mingoo
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (05) : 1436 - 1449
  • [23] HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning
    Wang, Yang
    Qin, Yubin
    Liu, Leibo
    Wei, Shaojun
    Yin, Shouyi
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [24] Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators
    He, Xin
    Liu, Jiawen
    Xie, Zhen
    Chen, Hao
    Chen, Guoyang
    Zhang, Weifeng
    Li, Dong
    PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2021, 2021, : 227 - 241
  • [25] Towards CIM-friendly and Energy-Efficient DNN Accelerator via Bit-level Sparsity
    Karimzadeh, Foroozan
    Raychowdhury, Arijit
    PROCEEDINGS OF THE 2022 IFIP/IEEE 30TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2022,
  • [26] SparseMEM: Energy-efficient Design for In-memory Sparse-based Graph Processing
    Zahedi, Mahdi
    Custers, Geert
    Shahroodi, Taha
    Gaydadjiev, Georgi
    Wong, Stephan
    Hamdioui, Said
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [27] NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators With 3-D Stacked-DRAM
    Wang, Junpeng
    Ge, Mengke
    Ding, Bo
    Xu, Qi
    Chen, Song
    Kang, Yi
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1456 - 1469
  • [28] LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training
    Noh, Seock-Hwan
    Park, Junsang
    Park, Dahoon
    Koo, Jahyun
    Choi, Jeik
    Kung, Jaeha
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 443 - 450
  • [29] A 0.18μm CMOS implementation of an area efficient precise exception handling unit for processing-in-memory systems
    Mediratta, S
    Steele, C
    Singh, R
    Sondeen, J
    Draper, J
    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2004, : 455 - 458
  • [30] TSUNAMI: Triple Sparsity-Aware Ultra Energy-Efficient Neural Network Training Accelerator With Multi-Modal Iterative Pruning
    Kim, Sangyeob
    Lee, Juhyoung
    Kang, Sanghoon
    Han, Donghyeon
    Jo, Wooyoung
    Yoo, Hoi-Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (04) : 1494 - 1506