Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

被引：0

作者：

Jaitly, Navdeep ^{[1
]}

Patrick Nguyen ^{[1
]}

Senior, Andrew ^{[1
]}

Vanhoucke, Vincent ^{[1
]}

机构：

[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

Deep Belief Networks; Acoustic Modeling; Artificial Neural Network; ANN/HMM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of You Tube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.

引用

页码：2577 / 2580

页数：4

共 50 条

[1] EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
Yu, Dong
Seide, Frank
Li, Gang
Deng, Li
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4409 - 4412
[2] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
Wu, Jibin
Yilmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
[J]. FRONTIERS IN NEUROSCIENCE, 2020, 14
[3] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
Yu, Dong
Deng, Li
Seide, Frank
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
[4] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
Yu, Dong
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
[5] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Farooq, Muhammad Umar
Adeeba, Farah
Rauf, Sahar
Hussain, Sarmad
[J]. INTERSPEECH 2019, 2019, : 2978 - 2982
[6] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
Yu, Dong
Deng, Li
Seide, Frank
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
[7] A CLUSTER-BASED MULTIPLE DEEP NEURAL NETWORKS METHOD FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Zhou, Pan
Liu, Cong
Liu, Qingfeng
Dai, Lirong
Jiang, Hui
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6650 - 6654
[8] Emotion Recognition Using Pretrained Deep Neural Networks
Dobes, Marek
Sabolova, Natalia
[J]. ACTA POLYTECHNICA HUNGARICA, 2023, 20 (04) : 195 - 204
[9] A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition
Toth, Laszlo
Grosz, Tamas
[J]. TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 36 - 43
[10] LARGE VOCABULARY SPEECH RECOGNITION USING NEURAL-FUZZY AND CONCEPT NETWORKS
HATAOKA, N
AMANO, A
ARITSUKA, T
ICHIKAWA, A
[J]. LECTURE NOTES IN COMPUTER SCIENCE, 1990, 412 : 186 - 196

← 1 2 3 4 5 →