Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

被引:9
|
作者
Morales-Cordovilla, Juan A. [1 ]
Peinado, Antonio M. [1 ]
Sanchez, Victoria [1 ]
Gonzalez, Jose A. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Acoustic noise; autocorrelation-based mel frequency cepstral coefficient (AMFCC); autocorrelation estimation; pitch-synchronous analysis; robust speech recognition;
D O I
10.1109/TASL.2010.2053846
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.
引用
收藏
页码:640 / 651
页数:12
相关论文
共 50 条
  • [1] PITCH-SYNCHRONOUS DIGITAL FEATURE EXTRACTION SYSTEM FOR PHONEMIC RECOGNITION OF SPEECH
    HESS, WJ
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (01): : 14 - 25
  • [2] A pitch-synchronous peak-amplitude based feature extraction method for noise robust asr
    Ghulam, Muhammad
    Horikawa, Junsei
    Nitta, Tsuneo
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 505 - 508
  • [3] NOTE ON PITCH-SYNCHRONOUS PROCESSING OF SPEECH
    DAVID, EE
    MCDONALD, HS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1956, 28 (06): : 1261 - 1266
  • [4] A NOTE ON PITCH-SYNCHRONOUS PROCESSING OF SPEECH
    DAVID, EE
    MCDONALD, HS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1956, 28 (01): : 159 - 159
  • [5] Pitch-synchronous ZCPA (PS-ZCPA)-based feature extraction with auditory masking
    Ghulam, M
    Fukuda, T
    Horikawa, J
    Nitta, T
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 517 - 520
  • [6] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
    Shruti Gupta
    Md. Shah Fahad
    Akshay Deepak
    Multimedia Tools and Applications, 2020, 79 : 23347 - 23365
  • [7] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
    Gupta, Shruti
    Fahad, Md. Shah
    Deepak, Akshay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23347 - 23365
  • [8] A PITCH-SYNCHRONOUS ANALYSIS OF HOARSENESS IN RUNNING SPEECH
    MUTA, H
    BAER, T
    WAGATSUMA, K
    MURAOKA, T
    FUKUDA, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 84 (04): : 1292 - 1301
  • [9] Pitch-synchronous peak-amplitude (PS-PA)-based feature extraction method for noise-robust ASR
    Ghulam, Muhammad
    Katsurada, Kouichi
    Horikawa, Junsei
    Nitta, Tsuneo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (11) : 2766 - 2774
  • [10] Pitch-synchronous linear-prediction analysis for automatic speech recognition systems
    Guerchi, D
    Hmimia, A
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: IMAGE, ACOUSTIC, SIGNAL PROCESSING AND OPTICAL SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 430 - 434