Memory-Efficient Batch Normalization by One-Pass Computation for On-Device Training

被引:0
|
作者
Dai, He [1 ]
Wang, Hang [1 ]
Zhang, Xuchong [2 ]
Sun, Hongbin [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Microelect, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Coll Artificial Intelligence, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Systolic arrays; Backpropagation; Artificial neural networks; Micromechanical devices; Feedforward systems; Memory management; Memory-efficient accelerator; deep neural networks; batch normalization; on-device training; one-pass computation;
D O I
10.1109/TCSII.2024.3354738
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Batch normalization (BN) has become ubiquitous in modern deep learning architectures because of its remarkable improvement in deep neural network (DNN) training performance. However, the two-pass computation of statistical estimation and element-wise normalization in BN training requires two accesses to the input data, resulting in a huge increase in off-chip memory traffic during DNN training. In this brief, we propose a novel accelerator, named one-pass normalizer (OPN) to achieve memory-efficient BN for on-device training. Specifically, in terms of dataflow, we propose one-pass computation based on sampling-based range normalization and sparse data recovery techniques to reduce BN off-chip memory access. Regarding the OPN circuit, we propose channel-wise constant extraction to achieve a compact design. Experimental results show that the one-pass computation reduces off-chip memory access of BN by 2.0 similar to 3.8x compared with the previous state-of-the-art designs while maintaining training performance. Moreover, the channel-wise constant extraction saves the gate count and power consumption of OPN by 56% and 73%, respectively.
引用
收藏
页码:3186 / 3190
页数:5
相关论文
共 50 条
  • [1] Memory-efficient LVCSR search using a one-pass stack decoder
    Schuster, M
    [J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (01): : 47 - 77
  • [2] ACBN: Approximate Calculated Batch Normalization for Efficient DNN On-Device Training Processor
    Li, Baoting
    Wang, Hang
    Luo, Fujie
    Zhang, Xuchong
    Sun, Hongbin
    Zheng, Nanning
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (06) : 738 - 748
  • [3] A one-pass real-time decoder using memory-efficient state network
    Shao, Jian
    Li, Ta
    Zhang, Qingqing
    Zhao, Qingwei
    Yan, Yonghong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 529 - 537
  • [4] LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training
    Noh, Seock-Hwan
    Park, Junsang
    Park, Dahoon
    Koo, Jahyun
    Choi, Jeik
    Kung, Jaeha
    [J]. 2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 443 - 450
  • [5] LOW-RANK GRADIENT APPROXIMATION FOR MEMORY-EFFICIENT ON-DEVICE TRAINING OF DEEP NEURAL NETWORK
    Gooneratne, Mary
    Sim, Khe Chai
    Zadrazil, Petr
    Kabel, Andreas
    Beaufays, Francoise
    Motta, Giovanni
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3017 - 3021
  • [6] One-pass tableaux for computation tree logic
    Abate, Pietro
    Gore, Rajeev
    Widmann, Florian
    [J]. LOGIC FOR PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND REASONING, PROCEEDINGS, 2007, 4790 : 32 - +
  • [7] Memory-Efficient Fixpoint Computation
    Kim, Sung Kook
    Venet, Arnaud J.
    Thakur, Aditya, V
    [J]. STATIC ANALYSIS (SAS 2020), 2020, 12389 : 35 - 64
  • [8] Linear time and memory-efficient computation
    Regan, KW
    [J]. SIAM JOURNAL ON COMPUTING, 1996, 25 (01) : 133 - 168
  • [9] Computationally efficient, one-pass algorithm for morphological filters
    Dokladal, Petr
    Dokladalova, Eva
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2011, 22 (05) : 411 - 420
  • [10] Efficient One-Pass Decoding with NNLM for Speech Recognition
    Shi, Yongzhe
    Zhang, Wei-Qiang
    Cai, Meng
    Liu, Jia
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (04) : 377 - 381