Speaker and Noise Factorization for Robust Speech Recognition

被引:36
|
作者
Wang, Yongqiang [1 ]
Gales, Mark J. F. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
Acoustic factorization; noise robustness; speaker adaptation; vector Taylor series (VTS); HIDDEN MARKOV-MODELS; COMPENSATION; ADAPTATION;
D O I
10.1109/TASL.2012.2198059
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech recognition systems need to operate in a wide range of conditions. Thus they should be robust to extrinsic variability caused by various acoustic factors, for example speaker differences, transmission channel and background noise. For many scenarios, multiple factors simultaneously impact the underlying "clean" speech signal. This paper examines techniques to handle both speaker and background noise differences. An acoustic factorization approach is adopted. Here, separate transforms are assigned to represent the speaker [maximum-likelihood linear regression (MLLR)], and noise and channel [model-based vector Taylor series (VTS)] factors. This is a highly flexible framework compared to the standard approaches of modeling the combined impact of both speaker and noise factors. For example factorization allows the speaker characteristics obtained in one noise condition to be applied to a different environment. To obtain this factorization modified versions of MLLR and VTS training and application are derived. The proposed scheme is evaluated for both adaptation and factorization on the AURORA4 data.
引用
收藏
页码:2149 / 2158
页数:10
相关论文
共 50 条
  • [1] Noise robust estimate of speech dynamics for speaker recognition
    Openshaw, JP
    Mason, JS
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 925 - 928
  • [2] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [3] RAPID JOINT SPEAKER AND NOISE COMPENSATION FOR ROBUST SPEECH RECOGNITION
    Chin, K. K.
    Xu, Haitian
    Gales, Mark J. F.
    Breslin, Catherine
    Knill, Kate
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5500 - 5503
  • [4] Speaker normalized spectral subband parameters for noise robust speech recognition
    Tsuge, S
    Fukada, T
    Singer, H
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 285 - 288
  • [5] Noise Suppression based on nonnegative matrix factorization for robust speech recognition
    Fan, Hao-teng
    Lin, Pao-han
    Hung, Jeih-weih
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 1731 - +
  • [6] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
    Hariharan, R
    Viikki, O
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
  • [7] MULTILEVEL SPEECH INTELLIGIBILITY FOR ROBUST SPEAKER RECOGNITION
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4393 - 4396
  • [8] NONNEGATIVE MATRIX FACTORIZATION BASED NOISE ROBUST SPEAKER VERIFICATION
    Liu, S. H.
    Zou, Y. X.
    Ning, H. K.
    [J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 35 - 39
  • [9] Noise Robust Voice Detector for Speaker Recognition
    Hernandez, Gabriel
    Calvo, Jose R.
    Fernandez, Rafael
    Rodes, Ivis
    Martinez, Rafael
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2605 - 2608
  • [10] Noise robust speaker identification for spontaneous Arabic speech
    Graciarena, Martin
    Kajarekar, Sachin
    Stolcke, Andreas
    Shriberg, Elizabeth
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 245 - +