Normalization on temporal modulation transfer function for robust speech recognition

被引:3
|
作者
Lu, X.
Matsuda, S.
Shimizu, T.
Nakamura, S.
机构
关键词
D O I
10.1109/ISUC.2008.74
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 5Z26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE). (2)for reverberant noise, 51.28% RWER rate (10.17% for AFE).
引用
下载
收藏
页码:16 / 23
页数:8
相关论文
共 50 条
  • [31] Within-Class Feature Normalization for Robust Speech Recognition
    Liao, Yuan-Fu
    Hsu, Chi-Hui
    Yang, Chi-Min
    Lin, Jeng-Shien
    Chang, Sen-Chia
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1020 - 1023
  • [32] A Cepstral PDF Normalization Method for Noise Robust Speech Recognition
    Suk, Yong Ho
    Choi, Seung Ho
    ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 34 - +
  • [33] Robust speech recognition using the modulation spectrogram
    Kingsbury, BED
    Morgan, N
    Greenberg, S
    SPEECH COMMUNICATION, 1998, 25 (1-3) : 117 - 132
  • [34] Modulation spectrum equalization for robust speech recognition
    Sun, Liang-Che
    Hsu, Chang-Wen
    Lee, Lin-Shan
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 81 - 86
  • [35] Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network
    Salam, Md Sah Bin Hj
    Mohamad, Dzulkifli
    Salleh, Sheikh Hussain Shaikh
    2009 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION, 2009, : 442 - 447
  • [36] Modulation Spectrum Augmentation for Robust Speech Recognition
    Yan, Bi-Cheng
    Liu, Shih-Hung
    Chen, Berlin
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,
  • [37] Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
    Schaedler, Marc Rene
    Kollmeier, Birger
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1810 - 1813
  • [38] The Modulation Transfer Function for Speech Intelligibility
    Elliott, Taffeta M.
    Theunissen, Frederic E.
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (03)
  • [39] Feature Normalization Using Structured Full Transforms for Robust Speech Recognition
    Xiao, Xiong
    Li, Jinyu
    Chng, Eng Siong
    Li, Haizhou
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 700 - +
  • [40] Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition
    Choi, Bo Kyeong
    Ban, Sung Min
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 316 - 320