Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

被引:4
|
作者
Kang, Byung Ok [1 ,2 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Daejeon, South Korea
[2] Chungbuk Natl Univ, Sch Elect Engn, Cheongju, South Korea
来源
关键词
noise-robust speech recognition; acoustic model; GMM combination; non-native speech recognition;
D O I
10.1587/transinf.2015EDP7252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
引用
收藏
页码:724 / 730
页数:7
相关论文
共 50 条
  • [41] Combining Acoustic Name Spotting and Continuous Context Models to improve Spoken Person Name Recognition in Speech
    Bigot, Benjamin
    Senay, Gregory
    Linares, Georges
    Fredouille, Corinne
    Dufour, Richard
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2538 - 2542
  • [42] Robust speech recognition in additive and channel noise environments using GMM and EM algorithm
    Fujimoto, M
    Ariki, Y
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 941 - 944
  • [43] AN INTEGRATED APPROACH TO FEATURE COMPENSATION COMBINING PARTICLE FILTERS AND HIDDEN MARKOV MODELS FOR ROBUST SPEECH RECOGNITION
    Mushtaq, Aleem
    Hui-Lee, Chin
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4757 - 4760
  • [44] Combining multiple end-to-end speech recognition models based on density ratio approach
    Hojo, Keigo
    Mori, Daiki
    Wakabayashi, Yukoh
    Ohta, Kengo
    Ogawa, Atsunori
    Kitaoka, Norihide
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2274 - 2279
  • [45] Speaker Recognition and Speech Emotion Recognition Based on GMM
    Xu, Shupeng
    Liu, Yan
    Liu, Xiping
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ELECTRIC AND ELECTRONICS, 2013, : 434 - 436
  • [46] Bi-spectral acoustic features for robust speech recognition
    Onoe, Kazuo
    Sato, Shoei
    Homma, Shinichi
    Kobayashi, Akio
    Imai, Torn
    Takagi, Tohru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 631 - 634
  • [47] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [48] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [49] Online Generation of Acoustic Models for Multilingual Speech Recognition
    Raab, Martin
    Aradilla, Guillermo
    Gruhn, Rainer
    Noeth, Elmar
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2979 - +
  • [50] Boosting acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260