On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition

被引：0

作者：

Tyagi, V ^{[1
]}

Wellekens, C ^{[1
]}

机构：

[1] Inst Eurecom, F-06904 Sophia Antipolis, France

来源：

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is well known that the peaks in log Mel-filter bank spectrum are important cues in characterizing the speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. Mel-cepstrum modulation spectrum [3] is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.

引用

页码：529 / 532

页数：4

共 50 条

[1] Speaker Recognition Based on Weighted Mel-cepstrum
Yang Hong-wu
Liu Ya-li
Huang De-zhi
[J]. ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 200 - +
[2] Impaired speech evaluation using Mel-Cepstrum analysis
Grigore, Ovidiu
Grigore, Corina
Velican, Valentin
[J]. International Journal of Circuits, Systems and Signal Processing, 2011, 5 (01): : 70 - 77
[3] Speech/music discrimination using Mel-cepstrum modulation energy
Kim, Bong-Wan
Choi, Dae-Lim
Lee, Yong-Ju
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 406 - +
[4] Mel-cepstrum modulation spectrum (MCMS) features for robust ASR
Tyagi, V
McCowan, L
Misra, H
Bourlard, H
[J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 399 - 404
[5] Perceptually weighted mel-cepstrum analysis of speech based on psychoacoustic model
Yang, Hongwu
Huang, Dezhi
Cai, Lianhong
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (12): : 2998 - 3001
[6] Speaker recognition model using Two-Dimensional Mel-Cepstrum and predictive neural network
Kitamura, T
Takei, S
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1772 - 1775
[7] Evaluation of MEL-LPC cepstrum in a large vocabulary continuous speech recognition
Matsumoto, H
Moroto, M
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 117 - 120
[8] IMPROVEMENTS ON MEL-FREQUENCY CEPSTRUM MINIMUM-MEAN-SQUARE-ERROR NOISE SUPPRESSOR FOR ROBUST SPEECH RECOGNITION
Yu, Dong
Deng, Li
Wu, Jian
Gong, Yifan
Acero, Alex
[J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 69 - 72
[9] Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
Yin, Xiang
Ling, Zhen-Hua
Lei, Ming
Dai, Li-Rong
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1146 - 1149
[10] Bi-mel-scale frequency cepstrum and its application in telephone speech recognition
CHEN Jingdong
XU Bo
HUANG Taiyi(National Laboratory of Pattern Recognition
[J]. Chinese Journal of Acoustics, 1998, (03) : 234 - 243

← 1 2 3 4 5 →