Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引:0
|
作者
Sai, B. Tarun [1 ]
Yadav, Ishwar Chandra [1 ]
Shahnawazuddin, S. [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
关键词
Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.
引用
收藏
页码:242 / 246
页数:5
相关论文
共 50 条
  • [41] Improving Robustness to Compressed Speech in Speaker Recognition
    McLaren, Mitchell
    Abrash, Victor
    Graciarena, Martin
    Lei, Yun
    Pesan, Jan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3665 - 3669
  • [42] Enhancing the Recognition of Children's Speech on Acoustically Mismatched ASR System
    Shahnawazuddin, S.
    Kathania, Hemant Kumar
    Sinha, Rohit
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [43] Enhancing the National Blood Transfusino Serrvices Through Speech-Recognition Technology
    Hesen, E.
    TRANSFUSION, 2016, 56 : 249A - 249A
  • [44] ENHANCING PRIVACY THROUGH DOMAIN ADAPTIVE NOISE INJECTION FOR SPEECH EMOTION RECOGNITION
    Feng, Tiantian
    Hashemi, Hanieh
    Annavaram, Murali
    Narayanan, Shrikanth S.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7702 - 7706
  • [45] Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
    Olivier, Raphael
    Raj, Bhiksha
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6372 - 6386
  • [46] Enhancing speech emotion recognition through deep learning and handcrafted feature fusion
    Eris, Fatma Gunes
    Akbal, Erhan
    APPLIED ACOUSTICS, 2024, 222
  • [47] Augmented higher cognition: Enhancing speech recognition through neural activity measures
    Viirre, E
    Jung, TP
    FOUNDATIONS OF AUGMENTED COGNITION, VOL 11, 2005, : 1122 - 1131
  • [48] Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit
    Guglani, Jyoti
    Mishra, A. N.
    APPLIED ACOUSTICS, 2020, 167
  • [49] Towards improving speech detection robustness for speech recognition in adverse conditions
    Karray, L
    Martin, A
    SPEECH COMMUNICATION, 2003, 40 (03) : 261 - 276
  • [50] A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments
    Peng, Wenjie
    Fu, Kaiqi
    Zhang, Wei
    Xie, Yanlu
    Zhang, Jinsong
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 172 - 176