Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

被引:4
|
作者
Kang, Byung Ok [1 ,2 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Daejeon, South Korea
[2] Chungbuk Natl Univ, Sch Elect Engn, Cheongju, South Korea
来源
关键词
noise-robust speech recognition; acoustic model; GMM combination; non-native speech recognition;
D O I
10.1587/transinf.2015EDP7252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
引用
收藏
页码:724 / 730
页数:7
相关论文
共 50 条
  • [31] Towards robust and adaptive speech recognition models
    Bourlard, H
    Bengio, S
    Weber, K
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 169 - 189
  • [32] Combining Multiple Models for Speech Information Retrieval
    Alzghool, Muath
    Inkpen, Diana
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 132 - 135
  • [33] Joint decoding of multiple speech patterns for robust speech recognition
    Nair, Nishanth Ulhas
    Sreenivas, T. V.
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 93 - 98
  • [34] Automatic Annotation of Speech Corpora using Complementary GMM and DNN Acoustic Models
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    2018 41ST INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2018, : 794 - 797
  • [35] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
    Lee, Mun-Hak
    Lee, Sang-Eon
    Seong, Ju-Seok
    Chang, Joon-Hyuk
    Kwon, Haeyoung
    Park, Chanhee
    INTERSPEECH 2022, 2022, : 56 - 60
  • [36] Combining speech enhancement with feature post-processing for robust speech recognition
    Lei, Jianjun
    Guo, Jun
    Liu, Gang
    Wang, Jian
    Nie, Xiangfei
    Yang, Zhen
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 773 - 778
  • [37] Combining acoustic features for improved emotion recognition in Mandarin speech
    Pao, TL
    Chen, YT
    Yeh, JH
    Liao, WY
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 279 - 285
  • [38] Limited training data robust speech recognition using kernel-based acoustic models
    Schaffoener, Martin
    Krueger, Sven E.
    Andelic, Edin
    Katz, Marcel
    Wendemuth, Andreas
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1137 - 1140
  • [39] A GMM/CPSO Speech Recognition System
    Viana Beserra, Amanda Abelardo
    Santos Silva, Washington Luis
    de Oliveira Serra, Ginalber Luiz
    2015 IEEE 24TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2015, : 26 - 31
  • [40] Feature compensation employing multiple environmental models for robust in-vehicle speech recognition
    Kim, Wooil
    Hansen, John H. L.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 430 - 438