Toward growing modular deep neural networks for continuous speech recognition

被引:0
|
作者
Zohreh Ansari
Seyyed Ali Seyyedsalehi
机构
[1] Amirkabir University of Technology (Tehran Polytechnic),Speech Processing Lab., Faculty of Biomedical Engineering
来源
关键词
Deep neural networks; Modular neural networks; Pre-training; Nonlinear filtering; Double spatiotemporal; Speaker adaptation; Continuous speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
The performance drop of typical automatic speech recognition systems in real applications is related to their not properly designed structure and training procedure. In this article, a growing modular deep neural network (MDNN) for speech recognition is introduced. According to its structure, this network is pre-trained in a special manner. The ability of the MDNN to grow enables it to implement spatiotemporal information of the frame sequences at the input and their labels at the output layer at the same time. The trained network with such a double spatiotemporal (DST) structure has learned valid phonetic sequences subspace. Therefore, it can filter out invalid output sequences in its own structure. In order to improve the proposed network performance in speaker variations, two speaker adaptation methods are also presented in this work. In these adaptation methods, the network trains how to move distorted input representations nonlinearly to their optimal positions or to adapt itself based on the input information. To evaluate the proposed MDNN structure and its modified versions, two Persian speech datasets are used: FARSDAT and Large FARSDAT. As there is no frame-level transcription for large vocabulary speech datasets, a semi-supervised learning algorithm is explored to train MDNN on Large FARSDAT. Experimental results on FARSDAT verify that implementing the DST structure besides speaker adaptation methods achieves up to 7.3 and 10.6 % absolute phone accuracy rate improvement over the MDNN and typical hidden Markov model, respectively. Likewise, semi-supervised training of the grown MDNN on Large FARSDAT improves its recognition performance up to 5 %.
引用
收藏
页码:1177 / 1196
页数:19
相关论文
共 50 条
  • [41] Continuous mandarin speech recognition using hierarchical recurrent neural networks
    Liao, YF
    Chen, WY
    Chen, SH
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3370 - 3373
  • [42] Continuous speech recognition with neural networks and stationary-transitional acoustic
    Gemello, R
    Albesano, D
    Mana, F
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2107 - 2111
  • [43] Relating EEG to continuous speech using deep neural networks: a review
    Puffay, Corentin
    Accou, Bernd
    Bollens, Lies
    Monesi, Mohammad Jalilpour
    Vanthornhout, Jonas
    Van Hamme, Hugo
    Francart, Tom
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (04)
  • [44] Deep Recurrent Neural Networks in Speech Synthesis Using a Continuous Vocoder
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 282 - 291
  • [45] AN EMPIRICAL STUDY OF LEARNING RATES IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Senior, Andrew
    Heigold, Georg
    Ranzato, Marc'Aurelio
    Yang, Ke
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6724 - 6728
  • [46] Audio Visual Speech Recognition Using Deep Recurrent Neural Networks
    Thanda, Abhinav
    Venkatesan, Shankar M.
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 98 - 109
  • [47] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
    Yu, Dong
    Deng, Li
    Seide, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
  • [48] An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 714 - 718
  • [49] Small-Footprint Highway Deep Neural Networks for Speech Recognition
    Lu, Liang
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1502 - 1511
  • [50] Maxout neurons for deep convolutional and LSTM neural networks in speech recognition
    Cai, Meng
    Liu, Jia
    SPEECH COMMUNICATION, 2016, 77 : 53 - 64