BAYESIAN AND GAUSSIAN PROCESS NEURAL NETWORKS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

被引：0

作者：

Hu, Shoukang ^{[1
]}

Lam, Max W. Y. ^{[1
]}

Xie, Xurong ^{[1
]}

Liu, Shansong ^{[1
]}

Yu, Jianwei ^{[1
]}

Wu, Xixin ^{[1
]}

Liu, Xunying ^{[1
]}

Meng, Helen ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Bayesian Neural Network; Gaussian Process Neural Network; activation function selection; speech recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The hidden activation functions inside deep neural networks ( DNNs) play a vital role in learning high level discriminative features and controlling the information flows to track longer history. However, the fixed model parameters used in standard DNNs can lead to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations used in DNNs are often manually set at a global level for all hidden nodes, thus lacking an automatic selection method. In order to address these issues, Bayesian neural networks ( BNNs) acoustic models are proposed in this paper to explicitly model the uncertainty associated with DNN parameters. Gaussian Process ( GP) activations based DNN and LSTM acoustic models are also used in this paper to allow the optimal forms of hidden activations to be stochastically learned for individual hidden nodes. An efficient variational inference based training algorithm is derived for BNN, GPNN and GPLSTM systems. Experiments were conducted on a LVCSR system trained on a 75 hour subset of Switchboard I data. The best BNN and GPNN systems outperformed both the baseline DNN systems constructed using fixed form activations and their combination via frame level joint decoding by 1% absolute in word error rate.

引用

页码：6555 / 6559

页数：5

共 50 条

[31] Towards speech rate independence in large vocabulary continuous speech recognition
Martinez, F
Tapias, D
Alvarez, J
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 725 - 728
[32] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
Yu, Dong
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
[33] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Farooq, Muhammad Umar
Adeeba, Farah
Rauf, Sahar
Hussain, Sarmad
INTERSPEECH 2019, 2019, : 2978 - 2982
[34] Parallel Scalability in Speech Recognition Inference engines in large vocabulary continuous speech recognition
You, Kisun
Chong, Jike
Yi, Youngmin
Gonina, Ekaterina
Hughes, Christopher J.
Chen, Yen-Kuang
Sung, Wonyong
Keutzer, Kurt
IEEE SIGNAL PROCESSING MAGAZINE, 2009, 26 (06) : 124 - 135
[35] State-based Gaussian selection in large vocabulary continuous speech recognition using HMM's
Gales, MJF
Knill, KM
Young, SJ
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (02): : 152 - 161
[36] A Segmental CRF Approach to Large Vocabulary Continuous Speech Recognition
Zweig, Geoffrey
Nguyen, Patrick
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 152 - 157
[37] A large vocabulary continuous speech recognition system for Persian language
Sameti, Hossein
Veisi, Hadi
Bahrani, Mohammad
Babaali, Bagher
Hosseinzadeh, Khosro
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
[38] A review of large-vocabulary continuous-speech recognition
Young, S
IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 45 - 57
[39] A LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM WITH HIGH PREDICTABILITY
SHIGENAGA, M
SEKIGUCHI, Y
YAMAGUCHI, T
MASUDA, R
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1817 - 1825
[40] Feature selection in mandarin large vocabulary continuous speech recognition
Zhu, X
Chen, YN
Liu, J
Liu, RS
2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 508 - 511

← 1 2 3 4 5 →