Temporal modulation normalization for robust speech feature extraction and recognition

被引:0
|
作者
Xugang Lu
Shigeki Matsuda
Masashi Unoki
Satoshi Nakamura
机构
[1] National Institute of Information and Communications Technology,
[2] Japan Advanced Institute of Science and Technology,undefined
来源
关键词
Robust speech recognition; Temporal modulation; Speech intelligibility; Edge-preserved smoothing;
D O I
暂无
中图分类号
学科分类号
摘要
Speech signals are produced by the articulatory movements with a certain modulation structure constrained by the regular phonetic sequences. This modulation structure encodes most of the speech intelligibility information that can be used to discriminate the speech from noise. In this study, we proposed a noise reduction algorithm based on this speech modulation property. Two steps are involved in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation events preserved smoothing. The purpose for these processing is to normalize the modulation contrast of the clean and noisy speech to be in the same level, and to smooth out the modulation artifacts caused by noise interferences. Since our proposed method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition on the AURORA-2J data corpus. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum mean square error (MMSE) estimation method, are used for comparison and combinations. Experimental results showed that, as an independent front-end processor, our proposed method outperforms the advanced methods, and as combined front-ends, further improved the performance consistently than using each method independently.
引用
下载
收藏
页码:187 / 199
页数:12
相关论文
共 50 条
  • [1] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
  • [2] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [3] Temporal structure normalization of speech feature for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) : 500 - 503
  • [4] Normalization on temporal modulation transfer function for robust speech recognition
    Lu, X.
    Matsuda, S.
    Shimizu, T.
    Nakamura, S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 16 - 23
  • [5] Discriminative temporal feature extraction for robust speech recognition
    Shen, JL
    ELECTRONICS LETTERS, 1997, 33 (19) : 1598 - 1600
  • [6] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
  • [7] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [8] FEATURE EXTRACTION WITH A MULTISCALE MODULATION ANALYSIS FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Mueller, Florian
    Mertins, Alfred
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7427 - 7431
  • [9] Feature extraction for robust speech recognition
    Dharanipragada, S
    2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 855 - 858
  • [10] Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition
    Lu, X.
    Matsuda, S.
    Unoki, M.
    Nakamura, S.
    SPEECH COMMUNICATION, 2010, 52 (01) : 1 - 11