Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

被引:0
|
作者
Jaitly, Navdeep [1 ]
Patrick Nguyen [1 ]
Senior, Andrew [1 ]
Vanhoucke, Vincent [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
关键词
Deep Belief Networks; Acoustic Modeling; Artificial Neural Network; ANN/HMM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of You Tube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.
引用
收藏
页码:2577 / 2580
页数:4
相关论文
共 50 条
  • [1] EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Yu, Dong
    Seide, Frank
    Li, Gang
    Deng, Li
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4409 - 4412
  • [2] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
    Wu, Jibin
    Yilmaz, Emre
    Zhang, Malu
    Li, Haizhou
    Tan, Kay Chen
    [J]. FRONTIERS IN NEUROSCIENCE, 2020, 14
  • [3] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
    Yu, Dong
    Deng, Li
    Seide, Frank
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
  • [4] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
    Yu, Dong
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
  • [5] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
    Farooq, Muhammad Umar
    Adeeba, Farah
    Rauf, Sahar
    Hussain, Sarmad
    [J]. INTERSPEECH 2019, 2019, : 2978 - 2982
  • [6] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
    Yu, Dong
    Deng, Li
    Seide, Frank
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
  • [7] A CLUSTER-BASED MULTIPLE DEEP NEURAL NETWORKS METHOD FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Zhou, Pan
    Liu, Cong
    Liu, Qingfeng
    Dai, Lirong
    Jiang, Hui
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6650 - 6654
  • [8] Emotion Recognition Using Pretrained Deep Neural Networks
    Dobes, Marek
    Sabolova, Natalia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2023, 20 (04) : 195 - 204
  • [9] A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition
    Toth, Laszlo
    Grosz, Tamas
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 36 - 43
  • [10] LARGE VOCABULARY SPEECH RECOGNITION USING NEURAL-FUZZY AND CONCEPT NETWORKS
    HATAOKA, N
    AMANO, A
    ARITSUKA, T
    ICHIKAWA, A
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1990, 412 : 186 - 196