Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks

被引：37

作者：

Sainath, Tara N. ^{[1
]}

Kingsbury, Brian ^{[1
]}

Soltau, Hagen ^{[1
]}

Ramabhadran, Bhuvana ^{[2
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10567 USA

[2] IBM Res, Multilingual Analyt, Yorktown Hts, NY 10598 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 11期

关键词：

Speech recognition; deep neural networks; parallel optimization techniques;

D O I：

10.1109/TASL.2013.2284378

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training these networks is slow. Even to date, the most common approach to train DNNs is via stochastic gradient descent, serially on one machine. Serial training, coupled with the large number of training parameters (i.e., 10-50 million) and speech data set sizes (i.e., 20-100 million training points) makes DNN training very slow for LVCSR tasks. In this work, we explore a variety of different optimization techniques to improve DNN training speed. This includes parallelization of the gradient computation during cross-entropy and sequence training, as well as reducing the number of parameters in the network using a low-rank matrix factorization. Applying the proposed optimization techniques, we show that DNN training can be sped up by a factor of 3 on a 50-hour English Broadcast News (BN) task with no loss in accuracy. Furthermore, using the proposed techniques, we are able to train DNNs on a 300-hr Switchboard (SWB) task and a 400-hr English BN task, showing improvements between 9-30% relative over a state-of-the art GMM/HMM system while the number of parameters of the DNN is smaller than the GMM/HMM system.

引用

页码：2267 / 2276

页数：10

共 50 条

[41] Parallel Training of Neural Networks for Speech Recognition
Vesely, Karel
Burget, Lukas
Grezl, Frantisek
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 439 - 446
[42] MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS
Ghoshal, Arnab
Swietojanski, Pawel
Renals, Steve
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7319 - 7323
[43] Training deep quantum neural networks
Kerstin Beer
Dmytro Bondarenko
Terry Farrelly
Tobias J. Osborne
Robert Salzmann
Daniel Scheiermann
Ramona Wolf
Nature Communications, 11
[44] NOISY TRAINING FOR DEEP NEURAL NETWORKS
Meng, Xiangtao
Liu, Chao
Zhang, Zhiyong
Wang, Dong
2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
[45] Training deep quantum neural networks
Beer, Kerstin
Bondarenko, Dmytro
Farrelly, Terry
Osborne, Tobias J.
Salzmann, Robert
Scheiermann, Daniel
Wolf, Ramona
NATURE COMMUNICATIONS, 2020, 11 (01)
[46] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[47] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
Du, Jun
Wang, Qing
Gao, Tian
Xu, Yong
Dai, Lirong
Lee, Chin-Hui
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
[48] Deep Segmental Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
Jiang, Hui
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
[49] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
[50] Speech watermarking using Deep Neural Networks
Pavlovic, Kosta
Kovacevic, Slavko
Durovic, Igor
2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295

← 1 2 3 4 5 →