On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition

被引:0
|
作者
Tyagi, V [1 ]
Wellekens, C [1 ]
机构
[1] Inst Eurecom, F-06904 Sophia Antipolis, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. Mel-cepstrum modulation spectrum [3] is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.
引用
收藏
页码:529 / 532
页数:4
相关论文
共 50 条
  • [1] Speaker Recognition Based on Weighted Mel-cepstrum
    Yang Hong-wu
    Liu Ya-li
    Huang De-zhi
    [J]. ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 200 - +
  • [2] Impaired speech evaluation using Mel-Cepstrum analysis
    Grigore, Ovidiu
    Grigore, Corina
    Velican, Valentin
    [J]. International Journal of Circuits, Systems and Signal Processing, 2011, 5 (01): : 70 - 77
  • [3] Speech/music discrimination using Mel-cepstrum modulation energy
    Kim, Bong-Wan
    Choi, Dae-Lim
    Lee, Yong-Ju
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 406 - +
  • [4] Mel-cepstrum modulation spectrum (MCMS) features for robust ASR
    Tyagi, V
    McCowan, L
    Misra, H
    Bourlard, H
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 399 - 404
  • [5] Perceptually weighted mel-cepstrum analysis of speech based on psychoacoustic model
    Yang, Hongwu
    Huang, Dezhi
    Cai, Lianhong
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (12): : 2998 - 3001
  • [6] Speaker recognition model using Two-Dimensional Mel-Cepstrum and predictive neural network
    Kitamura, T
    Takei, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1772 - 1775
  • [7] Evaluation of MEL-LPC cepstrum in a large vocabulary continuous speech recognition
    Matsumoto, H
    Moroto, M
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 117 - 120
  • [8] IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION
    Yu, Dong
    Deng, Li
    Wu, Jian
    Gong, Yifan
    Acero, Alex
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 69 - 72
  • [9] Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
    Yin, Xiang
    Ling, Zhen-Hua
    Lei, Ming
    Dai, Li-Rong
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1146 - 1149
  • [10] Bi-mel-scale frequency cepstrum and its application in telephone speech recognition
    CHEN Jingdong
    XU Bo
    HUANG Taiyi(National Laboratory of Pattern Recognition
    [J]. Chinese Journal of Acoustics, 1998, (03) : 234 - 243