Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

被引:4
|
作者
Kang, Byung Ok [1 ,2 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Daejeon, South Korea
[2] Chungbuk Natl Univ, Sch Elect Engn, Cheongju, South Korea
来源
关键词
noise-robust speech recognition; acoustic model; GMM combination; non-native speech recognition;
D O I
10.1587/transinf.2015EDP7252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
引用
收藏
页码:724 / 730
页数:7
相关论文
共 50 条
  • [21] Combining Binaural and Cortical Features for Robust Speech Recognition
    Spille, Constantin
    Kollmeier, Birger
    Meyer, Bernd T.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 756 - 767
  • [22] Robust Speech Recognition Combining Cepstral and Articulatory Features
    Zha, Zhuan-ling
    Hu, Jin
    Zhan, Qing-ran
    Shan, Ya-hui
    Xie, Xiang
    Wang, Jing
    Cheng, Hao-bo
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1401 - 1405
  • [23] Combining standard and throat microphones for robust speech recognition
    Graciarena, M
    Franco, H
    Sonmez, K
    Bratt, H
    IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (03) : 72 - 74
  • [24] Robust speech recognition by using compensated acoustic scores
    Sato, S
    Onoe, K
    Kobayashi, A
    Imai, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 915 - 921
  • [25] Acoustic quality normalization for robust automatic speech recognition
    Muhammad G.
    International Journal of Speech Technology, 2007, 10 (4) : 175 - 182
  • [26] Multilingual acoustic models for speech recognition and synthesis
    Kunzmann, S
    Fischer, V
    Gonzalez, J
    Emam, O
    Günther, C
    Janke, E
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 745 - 748
  • [27] Dynamically configurable acoustic models for speech recognition
    Hwang, MY
    Huang, XD
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 669 - 672
  • [28] Compact Acoustic Models for Embedded Speech Recognition
    Levy, Christophe
    Linares, Georges
    Bonastre, Jean-Francois
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [29] Acoustic-to-Phrase Models for Speech Recognition
    Gaur, Yashesh
    Li, Jinyu
    Meng, Zhong
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 2240 - 2244
  • [30] Compact Acoustic Models for Embedded Speech Recognition
    Christophe Lévy
    Georges Linarès
    Jean-François Bonastre
    EURASIP Journal on Audio, Speech, and Music Processing, 2009