Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

被引:4
|
作者
Kang, Byung Ok [1 ,2 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Daejeon, South Korea
[2] Chungbuk Natl Univ, Sch Elect Engn, Cheongju, South Korea
来源
关键词
noise-robust speech recognition; acoustic model; GMM combination; non-native speech recognition;
D O I
10.1587/transinf.2015EDP7252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
引用
收藏
页码:724 / 730
页数:7
相关论文
共 50 条
  • [1] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] COMBINING SPEECH RECOGNITION AND ACOUSTIC WORD EMOTION MODELS FOR ROBUST TEXT-INDEPENDENT EMOTION RECOGNITION
    Schuller, Bjoern
    Vlasenko, Bogdan
    Arsic, Dejan
    Rigoll, Gerhard
    Wendemuth, Andreas
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1333 - +
  • [3] Combining acoustic and articulatory feature information for robust speech recognition
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 303 - 319
  • [4] Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system
    Khelifa M.O.M.
    Elhadj Y.M.
    Abdellah Y.
    Belkasmi M.
    International Journal of Speech Technology, 2017, 20 (04) : 937 - 949
  • [5] A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
    Xiao, Xiong
    Li, Jinyu
    Chng, Eng Siong
    Li, Haizhou
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1158 - 1169
  • [6] ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION
    Narayanan, Arun
    Zhao, Xiaojia
    Wang, DeLiang
    Fosler-Lussier, Eric
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4800 - 4803
  • [7] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models
    Kosaka, Tetsuo
    Goto, Keisuke
    Ito, Takashi
    Kato, Masaharu
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2978 - 2981
  • [8] Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models
    Hoesen, Devin
    Satriawan, Cil Hardianto
    Lestari, Dessi Puji
    Khodra, Masayu Leylia
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 167 - 173
  • [9] GMM-BASED ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION
    Levy, Christophe
    Linares, Georges
    Bonastre, Jean-Francois
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1726 - 1729
  • [10] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113