Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引：0

作者：

Sai, B. Tarun ^{[1
]}

Yadav, Ishwar Chandra ^{[1
]}

Shahnawazuddin, S. ^{[1
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.

引用

页码：242 / 246

页数：5

共 50 条

[21] ENHANCING NOISE AND PITCH ROBUSTNESS OF CHILDREN'S ASR
Shahnawazuddin, S.
Deepak, K. T.
Pradhan, Gayadhar
Sinha, Rohit
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5225 - 5229
[22] Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features
Zeeshan, Muhammad
Qayoom, Huma
Hassan, Farman
2021 4TH INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2021,
[23] Improving speech detection robustness for wireless speech recognition
Karray, L
Mauuary, L
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 428 - 435
[24] ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION
Lin, Guan-Ting
Hsu, Chan-Jan
Liu, Da-Rong
Lee, Hung-Yi
Tsao, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8202 - 8206
[25] Toward noise robustness speech recognition
Namarvar, HH
Liaw, J
Berger, TW
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
[26] Enhancing speech recognition and speech understanding systems through non-phonetic cues
Marshall, R
1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4139 - 4147
[27] Robustness Analysis of Automatic Speech Signal Recognition System Against Factors Degrading Speech Signal
Oska, Jaroslaw
Wojtun, Jaroslaw
Wodecki, Krzysztof
Piotrowski, Zbigniew
SPA 2015 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS, 2015, : 71 - 75
[28] Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
Sun, Yanqing
Zhou, Yu
Zhao, Qingwei
Zhang, Pengyuan
Pan, Fuping
Yan, Yonghong
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) : 2431 - 2439
[29] Enhancing Amazigh Speech Recognition System with MFDWC-SVM
Abakarim, Fadwa
Abenaou, Abdenbi
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2023, PT I, 2023, 13956 : 471 - 488
[30] FUZZY SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
KOO, JM
UN, CK
ELECTRONICS LETTERS, 1990, 26 (11) : 743 - 744

← 1 2 3 4 5 →