Improving Training Time of Deep Neural Network With Asynchronous Averaged Stochastic Gradient Descent

被引:0
|
作者
You, Zhao [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, Beijing, Peoples R China
关键词
deep neural network; speech recognition; Asynchronous averaged SGD; one pass learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural network acoustic models have shown large improvement in performance over Gaussian mixture models (GMMs) in recent studies. Typically, stochastic gradient descent (SGD) is the most popular method for training deep neural networks. However, training DNN with minibatch based SGD is very slow. Because it requires frequent serial training and scanning the whole training set many passes before reaching the asymptotic region, making it difficult to scale to large dataset. Commonly, we can reduce training time from two aspects, reducing the epochs of training and exploring the distributed training algorithm. There are some distributed training algorithms, such as LBFGS, Hessian-free optimization and asynchronous SGD, have proven significantly reducing the training time. In order to further reduce the training time, we attempted to explore training algorithm with fast convergence and combined it with distributed training algorithm. Averaged stochastic gradient descent (ASGD) is proved simple and effective for one pass on-line learning. This paper investigates the asynchronous ASGD algorithm for deep neural network training. We tested asynchronous ASGD on the Mandarin Chinese recorded speech recognition task using deep neural networks. Experimental results show that the performance of one pass asynchronous ASGD is very close to that of multiple passes asynchronous SGD. Meanwhile, we can reduce the training time by a factor of 6.3.
引用
收藏
页码:446 / 449
页数:4
相关论文
共 50 条
  • [1] EXPLORING ONE PASS LEARNING FOR DEEP NEURAL NETWORK TRAINING WITH AVERAGED STOCHASTIC GRADIENT DESCENT
    You, Zhao
    Wang, Xiaorui
    Xu, Bo
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    [J]. NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [3] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun Zhou
    Cong-ying Han
    Tian-de Guo
    [J]. Acta Mathematicae Applicatae Sinica, English Series, 2021, 37 : 126 - 136
  • [4] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Zhou, Bai-cun
    Han, Cong-ying
    Guo, Tian-de
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2021, 37 (01): : 126 - 136
  • [5] Convergence of Stochastic Gradient Descent in Deep Neural Network
    Bai-cun ZHOU
    Cong-ying HAN
    Tian-de GUO
    [J]. Acta Mathematicae Applicatae Sinica, 2021, 37 (01) : 126 - 136
  • [6] ASYNCHRONOUS STOCHASTIC GRADIENT DESCENT FOR DNN TRAINING
    Zhang, Shanshan
    Zhang, Ce
    You, Zhao
    Zheng, Rong
    Xu, Bo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6660 - 6663
  • [7] Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks
    Li, Junyu
    He, Ligang
    Ren, Shenyuan
    Mao, Rui
    [J]. PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [8] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    [J]. JOURNAL OF COMPLEXITY, 2021, 64
  • [9] Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
    Bogoychev, Nikolay
    Junczys-Dowmunt, Marcin
    Heafield, Kenneth
    Aji, Alham Fikri
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2991 - 2996
  • [10] Universality of gradient descent neural network training
    Welper, G.
    [J]. NEURAL NETWORKS, 2022, 150 : 259 - 273