Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines

被引:26
|
作者
Liu, Siting [1 ]
Jiang, Honglan [1 ]
Liu, Leibo [2 ]
Han, Jie [1 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G1H9, Canada
[2] Tsinghua Univ, Inst Microelect, Beijing 100084, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
Adaptive filter (AF); gradient descent (GD); machine learning; neural networks (NNs); softmax regression (SR); stochastic computing (SC); IMPLEMENTATION; COMPUTATION; DESIGN;
D O I
10.1109/TCAD.2018.2858363
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Gradient descent (GD) is a widely used optimization algorithm in machine learning. In this paper, a novel stochastic computing GD circuit (SC-GDC) is proposed by encoding the gradient information in stochastic sequences. Inspired by the structure of a neuron, a stochastic integrator is used to optimize the weights in a learning machine by its "inhibitory" and "excitatory" inputs. Specifically, two AND (or XNOR) gates for the unipolar representation (or the bipolar representation) and one stochastic integrator are, respectively, used to implement the multiplications and accumulations in a GD algorithm. Thus, the SC-GDC is very area- and power-efficient. As per the formulation of the proposed SC-GDC, it provides unbiased estimate of the optimized weights in a learning algorithm. The proposed SC-GDC is then used to implement a least-mean-square algorithm and a softmax regression. With a similar accuracy, the proposed design achieves more than 30x improvement in throughput per area (TPA) and consumes less than 13% of the energy per training sample, compared with a fixed-point implementation. Moreover, a signed SC-GDC is proposed for training complex neural networks (NNs). It is shown that for a 784-128-128-10 fully connected NN, the signed SC-GDC produces a similar training result with its fixed-point counterpart, while achieving more than 90% energy saving and 82% reduction in training time with more than 50x improvement in TPA.
引用
收藏
页码:2530 / 2541
页数:12
相关论文
共 50 条
  • [1] Fuzzy Kernel Stochastic Gradient Descent Machines
    Tuan Nguyen
    Phuong Duong
    Trung Le
    Anh Le
    Viet Ngo
    Dat Tran
    Ma, Wanli
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3226 - 3232
  • [2] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
    Teng, Yunfei
    Gao, Wenbo
    Chalus, Francois
    Choromanska, Anna
    Goldfarb, Donald
    Weller, Adrian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Efficient stochastic parallel gradient descent training for on-chip optical processor
    Wan, Yuanjian
    Liu, Xudong
    Wu, Guangze
    Yang, Min
    Yan, Guofeng
    Zhang, Yu
    Wang, Jian
    [J]. OPTO-ELECTRONIC ADVANCES, 2024, 7 (04)
  • [4] Efficient stochastic parallel gradient descent training for on-chip optical processor
    Yuanjian Wan
    Xudong Liu
    Guangze Wu
    Min Yang
    Guofeng Yan
    Yu Zhang
    Jian Wang
    [J]. Opto-ElectronicAdvances, 2024, 7 (04) : 7 - 18
  • [5] Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent
    Lopes, Felipe E.
    Ferreira, Joao Canas
    Fernandes, Marcelo A. C.
    [J]. ELECTRONICS, 2019, 8 (06)
  • [6] ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING
    Zhang, Shanshan
    Zhang, Ce
    You, Zhao
    Zheng, Rong
    Xu, Bo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6660 - 6663
  • [7] Efficient Parallel Stochastic Gradient Descent for Matrix Factorization Using GPU
    Nassar, Mohamed A.
    El-Sayed, Layla A. A.
    Taha, Yousry
    [J]. 2016 11TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2016, : 63 - 68
  • [8] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
    Lee, Sunwoo
    Kang, Qiao
    Agrawal, Ankit
    Choudhary, Alok
    Liao, Wei-keng
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727
  • [9] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
    Cong, Guojing
    Bhardwaj, Onkar
    Feng, Minwei
    [J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
  • [10] Efficient learning with robust gradient descent
    Matthew J. Holland
    Kazushi Ikeda
    [J]. Machine Learning, 2019, 108 : 1523 - 1560