On the use of evolutionary algorithms to improve the robustness of continuous speech recognition systems in adverse conditions

被引:1
|
作者
Selouani, SA
O'Shaughnessy, D
机构
[1] Univ Moncton, Secteur Gest Informat, Shippegan, NB E8S 1P6, Canada
[2] Univ Quebec, INRS Energie Mat Telecommun, Montreal, PQ H5A 1K6, Canada
关键词
speech recognition; genetic algorithms; Karhunen-Loeve transform; hidden Markov models; robustness;
D O I
10.1155/S1110865703302070
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loeve transform (KLT) in the mel-frequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to -4 dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
引用
收藏
页码:814 / 823
页数:10
相关论文
共 50 条
  • [11] Speech recognition in adverse conditions by humans and machines
    Patman, Chloe
    Chodroff, Eleanor
    JASA EXPRESS LETTERS, 2024, 4 (11):
  • [12] The performance of automated speech recognition systems under adverse conditions of human exertion
    Entwistle, MS
    INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2003, 16 (02) : 127 - 140
  • [13] Particle Swarm Optimization to Improve Robustness of Distributed Speech Recognition
    Daalache, Mohamed Rafik Laid
    Selouani, Sid-Ahmed
    Boudraa, Malika
    Addou, Djamel
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2017, 103 (04) : 616 - 623
  • [14] Noise and speaker robustness in a Persian continuous speech recognition system
    Veisi, Hadi
    Sameti, Hossein
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 73 - 76
  • [15] Separation of Emotional and Reconstruction Embeddings on Ladder Network to Improve Speech Emotion Recognition Robustness in Noisy Conditions
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    INTERSPEECH 2021, 2021, : 2871 - 2875
  • [16] PARALLEL ALGORITHMS FOR SYLLABLE RECOGNITION IN CONTINUOUS SPEECH.
    De Mori, Renato
    Laface, Pietro
    Yu Mong
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 1985, PAMI-7 (01) : 56 - 69
  • [17] USE OF SYNTACTIC CONSTRAINTS FOR CONTINUOUS SPEECH RECOGNITION
    QUINTON, P
    TSI-TECHNIQUE ET SCIENCE INFORMATIQUES, 1982, 1 (03): : 233 - 248
  • [18] USE OF ASSOCIATIVE INFORMATION TO CONTINUOUS SPEECH RECOGNITION
    SEKIGUCHI, Y
    SHIGENAGA, M
    SYSTEMS AND COMPUTERS IN JAPAN, 1995, 26 (12) : 96 - 106
  • [19] USE OF A SYNTACTIC ANALYZER IN CONTINUOUS SPEECH RECOGNITION
    QUINTON, P
    ANNALES DES TELECOMMUNICATIONS-ANNALS OF TELECOMMUNICATIONS, 1977, 32 (9-10): : 323 - 336
  • [20] Feature extraction algorithms to improve the speech emotion recognition rate
    Koduru, Anusha
    Valiveti, Hima Bindu
    Budati, Anil Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 45 - 55