Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition

被引:9
|
作者
Morales-Cordovilla, Juan A. [1 ]
Peinado, Antonio M. [1 ]
Sanchez, Victoria [1 ]
Gonzalez, Jose A. [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
关键词
Acoustic noise; autocorrelation-based mel frequency cepstral coefficient (AMFCC); autocorrelation estimation; pitch-synchronous analysis; robust speech recognition;
D O I
10.1109/TASL.2010.2053846
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two estimators for the autocorrelation sequence of a periodic signal in additive noise. Both estimators are formulated employing tables which contain all the possible products of sample pairs in a speech signal frame. The first estimator is based on a pitch-synchronous averaging. This estimator is statistically analyzed and we show that the signal-to-noise ratio (SNR) can be increased up to a factor equal to the number of available periods. The second estimator is similar to the former one but it avoids the use of those sample products more likely affected by noise. We prove that, under certain conditions, this estimator can remove the effect of an additive noise in a statistical sense. Both estimators are employed to extract mel frequency cepstral coefficients (MFCCs) as features for robust speech recognition. Although these estimators are initially conceived for voiced speech frames, we extend their application to unvoiced sounds in order to obtain a coherent feature extractor. The experimental results show the superiority of the proposed approach over other MFCC-based front-ends such as the higher-lag autocorrelation spectrum estimation (HASE), which also employs the idea of avoiding those autocorrelation coefficients more likely affected by noise.
引用
收藏
页码:640 / 651
页数:12
相关论文
共 50 条
  • [41] Feature Extraction Based on DCT and MVDR Spectral Estimation for Robust Speech Recognition
    Seyedin, Sanaz
    Ahadi, Mohammad
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 605 - 608
  • [42] Robust feature extraction for mobile-based speech emotion recognition system
    Lee, Kang-Kue
    Cho, Youn-Ho
    Park, Kyu-Sik
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 470 - 477
  • [43] PSYCHOACOUSTICAL MASKING EFFECT-BASED FEATURE EXTRACTION FOR ROBUST SPEECH RECOGNITION
    Naing, Hay Mar Soe
    Hidayat, Risanuri
    Winduratna, Bondhan
    Miyanaga, Yoshikazu
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (05): : 1641 - 1654
  • [44] A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System
    Kim, Jin-Seob
    Joo, Young-Sun
    Kang, Hong-Goo
    Jang, Inseon
    Ahn, ChungHyun
    Seo, Jeongil
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 408 - 411
  • [45] Robust Feature Extraction Methods for Speech Recognition in Noisy Environments
    Mukheolkar, Ajinkya Sunil
    Alex, John Sahaya Rani
    2014 FIRST INTERNATIONAL CONFERENCE ON NETWORKS & SOFT COMPUTING (ICNSC), 2014, : 295 - 299
  • [46] A bio-inspired feature extraction for robust speech recognition
    Zouhir, Youssef
    Ouni, Kais
    SPRINGERPLUS, 2014, 3
  • [47] Temporal modulation normalization for robust speech feature extraction and recognition
    Lu, Xugang
    Matsuda, Shigeki
    Unoki, Masashi
    Nakamura, Satoshi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
  • [48] Temporal modulation normalization for robust speech feature extraction and recognition
    Xugang Lu
    Shigeki Matsuda
    Masashi Unoki
    Satoshi Nakamura
    Multimedia Tools and Applications, 2011, 52 : 187 - 199
  • [49] A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition
    Tomar, Vikrant Singh
    Rose, Richard C.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 554 - 557
  • [50] Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition
    Missaoui, Ibrahim
    Lachiri, Zied
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (04) : 297 - 301