Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

被引:9
|
作者
Morales-Cordovilla, Juan A. [1 ]
Peinado, Antonio M. [1 ]
Sanchez, Victoria [1 ]
Gonzalez, Jose A. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Acoustic noise; autocorrelation-based mel frequency cepstral coefficient (AMFCC); autocorrelation estimation; pitch-synchronous analysis; robust speech recognition;
D O I
10.1109/TASL.2010.2053846
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.
引用
收藏
页码:640 / 651
页数:12
相关论文
共 50 条
  • [31] A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement
    Stahl, Johannes
    Mowlaee, Pejman
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (02) : 436 - 450
  • [32] Modified feature extraction methods in robust speech recognition
    Rajnoha, Josef
    Pollak, Petr
    2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 337 - +
  • [33] Discriminative temporal feature extraction for robust speech recognition
    Shen, JL
    ELECTRONICS LETTERS, 1997, 33 (19) : 1598 - 1600
  • [34] Distinctive phonetic feature extraction for robust speech recognition
    Fukuda, T
    Yamamoto, W
    Nitta, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 25 - 28
  • [35] PITCH-SYNCHRONOUS RESPONSE OF CAT COCHLEAR NERVE-FIBERS TO SPEECH SOUNDS
    HASHIMOTO, T
    KATAYAMA, Y
    MURATA, K
    TANIGUCHI, I
    JAPANESE JOURNAL OF PHYSIOLOGY, 1975, 25 (05): : 633 - 644
  • [36] Prosodic speech modifications using Pitch-Synchronous time-frequency interpolation
    Morais, ES
    Violaro, F
    Barbosa, PA
    ITS '98 PROCEEDINGS - SBT/IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 1998, : 225 - 230
  • [37] Combining speech enhancement and auditory feature extraction for robust speech recognition
    Kleinschmidt, M
    Tchorz, J
    Kollmeier, B
    SPEECH COMMUNICATION, 2001, 34 (1-2) : 75 - 91
  • [38] Robust Feature Extraction for Speech Recognition Based on Perceptually Motivated MUSIC and CCBC
    Han Zhiyan
    Wang Jian
    Wang Xu
    Lun Shuxian
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (01): : 105 - 110
  • [39] A robust feature extraction method based on CZCPA model for speech recognition system
    Zhang, XY
    Jiao, ZP
    Zhao, SY
    ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 3, 2005, : 89 - 92
  • [40] A robust feature extraction based on the MTF concept for speech recognition in reverberant environment
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2546 - 2549