Robust speech recognition using probabilistic union models

被引:27
|
作者
Ming, J [1 ]
Jancovic, P [1 ]
Smith, FJ [1 ]
机构
[1] Queens Univ Belfast, Dept Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
来源
基金
英国工程与自然科学研究理事会;
关键词
acoustic modeling; noise robustness; probabilistic union models; speech recognition;
D O I
10.1109/TSA.2002.803439
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a new statistical approach, namely the probabilistic union model, for speech recognition involving partial, unknown frequency-band corruption. Partial frequency-band corruption accounts for the effect of a family of real-world noises. Previous methods based on the missing feature theory usually require the identity of the noisy bands. This identification can be difficult for unexpected noise with unknown, time-varying band characteristics. The new model combines the local frequency-band information based on the union of random events, to reduce the dependence of the model on information about the noise. This model partially accomplishes the target: offering robustness to partial frequency-band corruption, while requiring no information about the noise. This paper introduces the theory and implementation of the union model, and is focused on several important advances. These new developments include a new algorithm for automatic order selection, a generalization of the modeling principle to accommodate partial feature stream corruption, and a combination of the union model with conventional noise reduction techniques to deal with a mixture of stationary noise and unknown, nonstationary noise. For the evaluation, we used the TIDIGITS database for speaker-independent connected digit recognition. The utterances were corrupted by various types of additive noise, stationary or time-varying, assuming no knowledge about the noise characteristics. The results indicate that the new model offers significantly improved robustness in comparison to other models.
引用
收藏
页码:403 / 414
页数:12
相关论文
共 50 条
  • [41] ROBUST SPEECH RECOGNITION USING DYNAMIC NOISE ADAPTATION
    Rennie, Steven
    Dognin, Pierre
    Fousek, Petr
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4592 - 4595
  • [42] Robust speech recognition using wavelet coefficient features
    Gupta, M
    Gilbert, A
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 445 - 448
  • [43] ROBUST ISOLATED SPEECH RECOGNITION USING BINARY MASKS
    Karadogan, Seliz Gulsen
    Larsen, Jan
    Pedersen, Michael Syskind
    Boldt, Jesper Bunsow
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1988 - 1992
  • [44] Robust speech recognition using time boundary detection
    Mohajer, K
    Hu, ZM
    [J]. MULTISENSOR, MULTISOURCE INFORMATION FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS 2003, 2003, 5099 : 335 - 343
  • [45] Speech recognition using linear dynamic models
    Frankel, Joe
    King, Simon
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 246 - 256
  • [46] AUTOMATIC SPEECH RECOGNITION USING PSYCHOACOUSTIC MODELS
    ZWICKER, E
    TERHARDT, E
    PAULUS, E
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (02): : 487 - 498
  • [47] Using SVMs and discriminative models for speech recognition
    Smith, ND
    Gales, MJF
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 77 - 80
  • [48] Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust Speech Recognition
    Suh, Youngjoo
    Kim, Hoirin
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (12) : 2421 - 2424
  • [49] Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition
    Kang, Byung Ok
    Kwon, Oh-Wook
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (03): : 724 - 730
  • [50] STRANDED GAUSSIAN MIXTURE HIDDEN MARKOV MODELS FOR ROBUST SPEECH RECOGNITION
    Zhao, Yong
    Juang, Biing-Hwang
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4301 - 4304