Toward growing modular deep neural networks for continuous speech recognition

被引：0

作者：

Zohreh Ansari

Seyyed Ali Seyyedsalehi

机构：

[1] Amirkabir University of Technology (Tehran Polytechnic),Speech Processing Lab., Faculty of Biomedical Engineering

来源：

Neural Computing and Applications | 2017年 / 28卷

关键词：

Deep neural networks; Modular neural networks; Pre-training; Nonlinear filtering; Double spatiotemporal; Speaker adaptation; Continuous speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The performance drop of typical automatic speech recognition systems in real applications is related to their not properly designed structure and training procedure. In this article, a growing modular deep neural network (MDNN) for speech recognition is introduced. According to its structure, this network is pre-trained in a special manner. The ability of the MDNN to grow enables it to implement spatiotemporal information of the frame sequences at the input and their labels at the output layer at the same time. The trained network with such a double spatiotemporal (DST) structure has learned valid phonetic sequences subspace. Therefore, it can filter out invalid output sequences in its own structure. In order to improve the proposed network performance in speaker variations, two speaker adaptation methods are also presented in this work. In these adaptation methods, the network trains how to move distorted input representations nonlinearly to their optimal positions or to adapt itself based on the input information. To evaluate the proposed MDNN structure and its modified versions, two Persian speech datasets are used: FARSDAT and Large FARSDAT. As there is no frame-level transcription for large vocabulary speech datasets, a semi-supervised learning algorithm is explored to train MDNN on Large FARSDAT. Experimental results on FARSDAT verify that implementing the DST structure besides speaker adaptation methods achieves up to 7.3 and 10.6 % absolute phone accuracy rate improvement over the MDNN and typical hidden Markov model, respectively. Likewise, semi-supervised training of the grown MDNN on Large FARSDAT improves its recognition performance up to 5 %.

引用

页码：1177 / 1196

页数：19

共 50 条

[41] Continuous mandarin speech recognition using hierarchical recurrent neural networks
Liao, YF
Chen, WY
Chen, SH
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3370 - 3373
[42] Continuous speech recognition with neural networks and stationary-transitional acoustic
Gemello, R
Albesano, D
Mana, F
1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2107 - 2111
[43] Relating EEG to continuous speech using deep neural networks: a review
Puffay, Corentin
Accou, Bernd
Bollens, Lies
Monesi, Mohammad Jalilpour
Vanthornhout, Jonas
Van Hamme, Hugo
Francart, Tom
JOURNAL OF NEURAL ENGINEERING, 2023, 20 (04)
[44] Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder
Al-Radhi, Mohammed Salah
Csapo, Tamas Gabor
Nemeth, Geza
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 282 - 291
[45] AN EMPIRICAL STUDY OF LEARNING RATES IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
Senior, Andrew
Heigold, Georg
Ranzato, Marc'Aurelio
Yang, Ke
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6724 - 6728
[46] Audio Visual Speech Recognition Using Deep Recurrent Neural Networks
Thanda, Abhinav
Venkatesan, Shankar M.
MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 98 - 109
[47] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
Yu, Dong
Deng, Li
Seide, Frank
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
[48] An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
Tong, Sibo
Garner, Philip N.
Bourlard, Herve
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 714 - 718
[49] Small-Footprint Highway Deep Neural Networks for Speech Recognition
Lu, Liang
Renals, Steve
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1502 - 1511
[50] Maxout neurons for deep convolutional and LSTM neural networks in speech recognition
Cai, Meng
Liu, Jia
SPEECH COMMUNICATION, 2016, 77 : 53 - 64

← 1 2 3 4 5 →