Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

被引:0
|
作者
Jaitly, Navdeep [1 ]
Patrick Nguyen [1 ]
Senior, Andrew [1 ]
Vanhoucke, Vincent [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
关键词
Deep Belief Networks; Acoustic Modeling; Artificial Neural Network; ANN/HMM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of You Tube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.
引用
收藏
页码:2577 / 2580
页数:4
相关论文
共 50 条
  • [31] Large vocabulary speech recognition in French
    Adda-Decker, M
    Adda, G
    Gauvain, JL
    Lamel, L
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 45 - 48
  • [32] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Weng, Chao
    Yu, Dong
    Watanabe, Shinji
    Juang, Biing-Hwang
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] FAST TRAINING OF DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Gong, Guojing
    Kingsbury, Brian
    Yang, Chih-Chieh
    Liu, Tianyi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6884 - 6888
  • [34] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [35] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [36] Advances in Large Vocabulary Speech Recognition
    Gauvain, JL
    De Mori, R
    Lamel, L
    [J]. COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 1 - 3
  • [37] Emotional Speech Recognition Using Deep Neural Networks
    Trinh Van, Loan
    Dao Thi Le, Thuy
    Le Xuan, Thanh
    Castelli, Eric
    [J]. SENSORS, 2022, 22 (04)
  • [38] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [39] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [40] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884