Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent

被引：0

作者：

You, Zhao ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

deep neural network; speech recognition; Asynchronous averaged SGD; one pass learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.

引用

页码：446 / 449

页数：4

共 50 条

[1] EXPLORING ONE PASS LEARNING FOR DEEP NEURAL NETWORK TRAINING WITH AVERAGED STOCHASTIC GRADIENT DESCENT
You, Zhao
Wang, Xiaorui
Xu, Bo
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
[J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
[3] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun Zhou
Cong-ying Han
Tian-de Guo
[J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
[4] Convergence of Stochastic Gradient Descent in Deep Neural Network
Zhou, Bai-cun
Han, Cong-ying
Guo, Tian-de
[J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
[5] Convergence of Stochastic Gradient Descent in Deep Neural Network
Bai-cun ZHOU
Cong-ying HAN
Tian-de GUO
[J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
[6] ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING
Zhang, Shanshan
Zhang, Ce
You, Zhao
Zheng, Rong
Xu, Bo
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6660 - 6663
[7] Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks
Li, Junyu
He, Ligang
Ren, Shenyuan
Mao, Rui
[J]. PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
[8] Non-convergence of stochastic gradient descent in the training of deep neural networks
Cheridito, Patrick
Jentzen, Arnulf
Rossmannek, Florian
[J]. JOURNAL OF COMPLEXITY, 2021, 64
[9] Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Bogoychev, Nikolay
Junczys-Dowmunt, Marcin
Heafield, Kenneth
Aji, Alham Fikri
[J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2991 - 2996
[10] Universality of gradient descent neural network training
Welper, G.
[J]. NEURAL NETWORKS, 2022, 150 : 259 - 273

← 1 2 3 4 5 →