Normalization on temporal modulation transfer function for robust speech recognition

被引:3
|
作者
Lu, X.
Matsuda, S.
Shimizu, T.
Nakamura, S.
机构
关键词
D O I
10.1109/ISUC.2008.74
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 5Z26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE). (2)for reverberant noise, 51.28% RWER rate (10.17% for AFE).
引用
下载
收藏
页码:16 / 23
页数:8
相关论文
共 50 条
  • [1] Temporal modulation normalization for robust speech feature extraction and recognition
    Xugang Lu
    Shigeki Matsuda
    Masashi Unoki
    Satoshi Nakamura
    Multimedia Tools and Applications, 2011, 52 : 187 - 199
  • [2] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357
  • [3] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
  • [4] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
  • [5] Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition
    Lu, X.
    Matsuda, S.
    Unoki, M.
    Nakamura, S.
    SPEECH COMMUNICATION, 2010, 52 (01) : 1 - 11
  • [6] Temporal structure normalization of speech feature for robust speech recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (07) : 500 - 503
  • [7] TEMPORAL CONTRAST NORMALIZATION AND EDGE-PRESERVED SMOOTHING ON TEMPORAL MODULATION STRUCTURE FOR ROBUST SPEECH RECOGNITION
    Lu, X.
    Matsuda, S.
    Unoki, M.
    Shimizu, T.
    Nakamura, S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4573 - 4576
  • [8] Improved modulation spectrum normalization techniques for robust speech recognition
    Pan, Chi-an
    Wang, Chieh-cheng
    Hung, Jeih-weih
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4089 - 4092
  • [9] Temporal Modulation Spectral Restoration for Robust Speech Recognition
    Wang, Svu-Siang
    Tsao, Yu
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 481 - 486
  • [10] JOINT SPECTRAL AND TEMPORAL NORMALIZATION OF FEATURES FOR ROBUST RECOGNITION OF NOISY AND REVERBERATED SPEECH
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4325 - 4328