Toward growing modular deep neural networks for continuous speech recognition

被引：0

作者：

Zohreh Ansari

Seyyed Ali Seyyedsalehi

机构：

[1] Amirkabir University of Technology (Tehran Polytechnic),Speech Processing Lab., Faculty of Biomedical Engineering

来源：

Neural Computing and Applications | 2017年 / 28卷

关键词：

Deep neural networks; Modular neural networks; Pre-training; Nonlinear filtering; Double spatiotemporal; Speaker adaptation; Continuous speech recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The performance drop of typical automatic speech recognition systems in real applications is related to their not properly designed structure and training procedure. In this article, a growing modular deep neural network (MDNN) for speech recognition is introduced. According to its structure, this network is pre-trained in a special manner. The ability of the MDNN to grow enables it to implement spatiotemporal information of the frame sequences at the input and their labels at the output layer at the same time. The trained network with such a double spatiotemporal (DST) structure has learned valid phonetic sequences subspace. Therefore, it can filter out invalid output sequences in its own structure. In order to improve the proposed network performance in speaker variations, two speaker adaptation methods are also presented in this work. In these adaptation methods, the network trains how to move distorted input representations nonlinearly to their optimal positions or to adapt itself based on the input information. To evaluate the proposed MDNN structure and its modified versions, two Persian speech datasets are used: FARSDAT and Large FARSDAT. As there is no frame-level transcription for large vocabulary speech datasets, a semi-supervised learning algorithm is explored to train MDNN on Large FARSDAT. Experimental results on FARSDAT verify that implementing the DST structure besides speaker adaptation methods achieves up to 7.3 and 10.6 % absolute phone accuracy rate improvement over the MDNN and typical hidden Markov model, respectively. Likewise, semi-supervised training of the grown MDNN on Large FARSDAT improves its recognition performance up to 5 %.

引用

页码：1177 / 1196

页数：19

共 50 条

[21] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[22] INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 124 - 129
[23] Mongolian Speech Recognition Based on Deep Neural Networks
Zhang, Hui
Bao, Feilong
Gao, Guanglai
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
[24] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
Zelinka, Jan
Salajka, Petr
Mueller, Ludek
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308
[25] EXPLORING DEEP NEURAL NETWORKS AND DEEP AUTOENCODERS IN REVERBERANT SPEECH RECOGNITION
Mimura, Masato
Sakai, Shinsuke
Kawahara, Tatsuya
2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 197 - 201
[26] Modular Construction of Time-Delay Neural Networks for Speech Recognition
Waibel, Alex
NEURAL COMPUTATION, 1989, 1 (01) : 39 - 46
[27] EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
He, Tianxing
Droppo, Jasha
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5445 - 5449
[28] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Dossou, Bonaventure F. P.
Gbenou, Yeno K. S.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
[29] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Qian, Yanmin
Woodland, Philip C.
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 481 - 488
[30] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
Mamyrbayev, Orken
Turdalyuly, Mussa
Mekebayev, Nurbapa
Alimhan, Keylan
Kydyrbekova, Aizat
Turdalykyzy, Tolganay
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474

← 1 2 3 4 5 →