Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引:0
|
作者
Sai, B. Tarun [1 ]
Yadav, Ishwar Chandra [1 ]
Shahnawazuddin, S. [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
关键词
Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.
引用
收藏
页码:242 / 246
页数:5
相关论文
共 50 条
  • [21] ENHANCING NOISE AND PITCH ROBUSTNESS OF CHILDREN'S ASR
    Shahnawazuddin, S.
    Deepak, K. T.
    Pradhan, Gayadhar
    Sinha, Rohit
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5225 - 5229
  • [22] Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features
    Zeeshan, Muhammad
    Qayoom, Huma
    Hassan, Farman
    2021 4TH INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2021,
  • [23] Improving speech detection robustness for wireless speech recognition
    Karray, L
    Mauuary, L
    1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 428 - 435
  • [24] ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION
    Lin, Guan-Ting
    Hsu, Chan-Jan
    Liu, Da-Rong
    Lee, Hung-Yi
    Tsao, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8202 - 8206
  • [25] Toward noise robustness speech recognition
    Namarvar, HH
    Liaw, J
    Berger, TW
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
  • [26] Enhancing speech recognition and speech understanding systems through non-phonetic cues
    Marshall, R
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4139 - 4147
  • [27] Robustness Analysis of Automatic Speech Signal Recognition System Against Factors Degrading Speech Signal
    Oska, Jaroslaw
    Wojtun, Jaroslaw
    Wodecki, Krzysztof
    Piotrowski, Zbigniew
    SPA 2015 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS, 2015, : 71 - 75
  • [28] Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
    Sun, Yanqing
    Zhou, Yu
    Zhao, Qingwei
    Zhang, Pengyuan
    Pan, Fuping
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) : 2431 - 2439
  • [29] Enhancing Amazigh Speech Recognition System with MFDWC-SVM
    Abakarim, Fadwa
    Abenaou, Abdenbi
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2023, PT I, 2023, 13956 : 471 - 488
  • [30] FUZZY SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
    KOO, JM
    UN, CK
    ELECTRONICS LETTERS, 1990, 26 (11) : 743 - 744