Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引：0

作者：

Sai, B. Tarun ^{[1
]}

Yadav, Ishwar Chandra ^{[1
]}

Shahnawazuddin, S. ^{[1
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.

引用

页码：242 / 246

页数：5

共 50 条

[41] Improving Robustness to Compressed Speech in Speaker Recognition
McLaren, Mitchell
Abrash, Victor
Graciarena, Martin
Lei, Yun
Pesan, Jan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3665 - 3669
[42] Enhancing the Recognition of Children's Speech on Acoustically Mismatched ASR System
Shahnawazuddin, S.
Kathania, Hemant Kumar
Sinha, Rohit
TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
[43] Enhancing the National Blood Transfusino Serrvices Through Speech-Recognition Technology
Hesen, E.
TRANSFUSION, 2016, 56 : 249A - 249A
[44] ENHANCING PRIVACY THROUGH DOMAIN ADAPTIVE NOISE INJECTION FOR SPEECH EMOTION RECOGNITION
Feng, Tiantian
Hashemi, Hanieh
Annavaram, Murali
Narayanan, Shrikanth S.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7702 - 7706
[45] Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
Olivier, Raphael
Raj, Bhiksha
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6372 - 6386
[46] Enhancing speech emotion recognition through deep learning and handcrafted feature fusion
Eris, Fatma Gunes
Akbal, Erhan
APPLIED ACOUSTICS, 2024, 222
[47] Augmented higher cognition: Enhancing speech recognition through neural activity measures
Viirre, E
Jung, TP
FOUNDATIONS OF AUGMENTED COGNITION, VOL 11, 2005, : 1122 - 1131
[48] Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit
Guglani, Jyoti
Mishra, A. N.
APPLIED ACOUSTICS, 2020, 167
[49] Towards improving speech detection robustness for speech recognition in adverse conditions
Karray, L
Martin, A
SPEECH COMMUNICATION, 2003, 40 (03) : 261 - 276
[50] A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments
Peng, Wenjie
Fu, Kaiqi
Zhang, Wei
Xie, Yanlu
Zhang, Jinsong
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 172 - 176

← 1 2 3 4 5 →