Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines

被引：26

作者：

Liu, Siting ^{[1
]}

Jiang, Honglan ^{[1
]}

Liu, Leibo ^{[2
]}

Han, Jie ^{[1
]}

机构：

[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G1H9, Canada

[2] Tsinghua Univ, Inst Microelect, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2018年 / 37卷 / 11期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Adaptive filter (AF); gradient descent (GD); machine learning; neural networks (NNs); softmax regression (SR); stochastic computing (SC); IMPLEMENTATION; COMPUTATION; DESIGN;

D O I：

10.1109/TCAD.2018.2858363

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Gradient descent (GD) is a widely used optimization algorithm in machine learning. In this paper, a novel stochastic computing GD circuit (SC-GDC) is proposed by encoding the gradient information in stochastic sequences. Inspired by the structure of a neuron, a stochastic integrator is used to optimize the weights in a learning machine by its "inhibitory" and "excitatory" inputs. Specifically, two AND (or XNOR) gates for the unipolar representation (or the bipolar representation) and one stochastic integrator are, respectively, used to implement the multiplications and accumulations in a GD algorithm. Thus, the SC-GDC is very area- and power-efficient. As per the formulation of the proposed SC-GDC, it provides unbiased estimate of the optimized weights in a learning algorithm. The proposed SC-GDC is then used to implement a least-mean-square algorithm and a softmax regression. With a similar accuracy, the proposed design achieves more than 30x improvement in throughput per area (TPA) and consumes less than 13% of the energy per training sample, compared with a fixed-point implementation. Moreover, a signed SC-GDC is proposed for training complex neural networks (NNs). It is shown that for a 784-128-128-10 fully connected NN, the signed SC-GDC produces a similar training result with its fixed-point counterpart, while achieving more than 90% energy saving and 82% reduction in training time with more than 50x improvement in TPA.

引用

页码：2530 / 2541

页数：12

共 50 条

[1] Fuzzy Kernel Stochastic Gradient Descent Machines
Tuan Nguyen
Phuong Duong
Trung Le
Anh Le
Viet Ngo
Dat Tran
Ma, Wanli
[J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3226 - 3232
[2] Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models
Teng, Yunfei
Gao, Wenbo
Chalus, Francois
Choromanska, Anna
Goldfarb, Donald
Weller, Adrian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Efficient stochastic parallel gradient descent training for on-chip optical processor
Wan, Yuanjian
Liu, Xudong
Wu, Guangze
Yang, Min
Yan, Guofeng
Zhang, Yu
Wang, Jian
[J]. OPTO-ELECTRONIC ADVANCES, 2024, 7 (04)
[4] Efficient stochastic parallel gradient descent training for on-chip optical processor
Yuanjian Wan
Xudong Liu
Guangze Wu
Min Yang
Guofeng Yan
Yu Zhang
Jian Wang
[J]. Opto-ElectronicAdvances, 2024, 7 (04) : 7 - 18
[5] Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent
Lopes, Felipe E.
Ferreira, Joao Canas
Fernandes, Marcelo A. C.
[J]. ELECTRONICS, 2019, 8 (06)
[6] ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING
Zhang, Shanshan
Zhang, Ce
You, Zhao
Zheng, Rong
Xu, Bo
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6660 - 6663
[7] Efficient Parallel Stochastic Gradient Descent for Matrix Factorization Using GPU
Nassar, Mohamed A.
El-Sayed, Layla A. A.
Taha, Yousry
[J]. 2016 11TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2016, : 63 - 68
[8] Communication-Efficient Local Stochastic Gradient Descent for Scalable Deep Learning
Lee, Sunwoo
Kang, Qiao
Agrawal, Ankit
Choudhary, Alok
Liao, Wei-keng
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 718 - 727
[9] An efficient, distributed stochastic gradient descent algorithm for deep-learning applications
Cong, Guojing
Bhardwaj, Onkar
Feng, Minwei
[J]. 2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 11 - 20
[10] Efficient learning with robust gradient descent
Matthew J. Holland
Kazushi Ikeda
[J]. Machine Learning, 2019, 108 : 1523 - 1560

← 1 2 3 4 5 →