BAYESIAN AND GAUSSIAN PROCESS NEURAL NETWORKS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引:0
|
作者
Hu, Shoukang [1 ]
Lam, Max W. Y. [1 ]
Xie, Xurong [1 ]
Liu, Shansong [1 ]
Yu, Jianwei [1 ]
Wu, Xixin [1 ]
Liu, Xunying [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Bayesian Neural Network; Gaussian Process Neural Network; activation function selection; speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The hidden activation functions inside deep neural networks ( DNNs) play a vital role in learning high level discriminative features and controlling the information flows to track longer history. However, the fixed model parameters used in standard DNNs can lead to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations used in DNNs are often manually set at a global level for all hidden nodes, thus lacking an automatic selection method. In order to address these issues, Bayesian neural networks ( BNNs) acoustic models are proposed in this paper to explicitly model the uncertainty associated with DNN parameters. Gaussian Process ( GP) activations based DNN and LSTM acoustic models are also used in this paper to allow the optimal forms of hidden activations to be stochastically learned for individual hidden nodes. An efficient variational inference based training algorithm is derived for BNN, GPNN and GPLSTM systems. Experiments were conducted on a LVCSR system trained on a 75 hour subset of Switchboard I data. The best BNN and GPNN systems outperformed both the baseline DNN systems constructed using fixed form activations and their combination via frame level joint decoding by 1% absolute in word error rate.
引用
收藏
页码:6555 / 6559
页数:5
相关论文
共 50 条
  • [31] Towards speech rate independence in large vocabulary continuous speech recognition
    Martinez, F
    Tapias, D
    Alvarez, J
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
  • [32] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
    Yu, Dong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
  • [33] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
    Farooq, Muhammad Umar
    Adeeba, Farah
    Rauf, Sahar
    Hussain, Sarmad
    INTERSPEECH 2019, 2019, : 2978 - 2982
  • [34] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
    You, Kisun
    Chong, Jike
    Yi, Youngmin
    Gonina, Ekaterina
    Hughes, Christopher J.
    Chen, Yen-Kuang
    Sung, Wonyong
    Keutzer, Kurt
    IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
  • [35] State-based Gaussian selection in large vocabulary continuous speech recognition using HMM's
    Gales, MJF
    Knill, KM
    Young, SJ
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (02): : 152 - 161
  • [36] A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition
    Zweig, Geoffrey
    Nguyen, Patrick
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 152 - 157
  • [37] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [39] A LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM WITH HIGH PREDICTABILITY
    SHIGENAGA, M
    SEKIGUCHI, Y
    YAMAGUCHI, T
    MASUDA, R
    IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1817 - 1825
  • [40] Feature selection in mandarin large vocabulary continuous speech recognition
    Zhu, X
    Chen, YN
    Liu, J
    Liu, RS
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 508 - 511