Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

被引：0

作者：

Jose A. Gonzalez

Angel M. Gómez

Antonio M. Peinado

Ning Ma

Jon Barker

机构：

[1] University of Sheffield,Department of Computer Science

[2] Telematics and Communications,Department of Signal Theory

来源：

Circuits, Systems, and Signal Processing | 2017年 / 36卷

关键词：

Speech recognition; Noise robustness; Feature compensation; Noise model estimation; Missing data imputation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

An effective way to increase noise robustness in automatic speech recognition (ASR) systems is feature enhancement based on an analytical distortion model that describes the effects of noise on the speech features. One of such distortion models that has been reported to achieve a good trade-off between accuracy and simplicity is the masking model. Under this model, speech distortion caused by environmental noise is seen as a spectral mask and, as a result, noisy speech features can be either reliable (speech is not masked by noise) or unreliable (speech is masked). In this paper, we present a detailed overview of this model and its applications to noise robust ASR. Firstly, using the masking model, we derive a spectral reconstruction technique aimed at enhancing the noisy speech features. Two problems must be solved in order to perform spectral reconstruction using the masking model: (1) mask estimation, i.e. determining the reliability of the noisy features, and (2) feature imputation, i.e. estimating speech for the unreliable features. Unlike missing data imputation techniques where the two problems are considered as independent, our technique jointly addresses them by exploiting a priori knowledge of the speech and noise sources in the form of a statistical model. Secondly, we propose an algorithm for estimating the noise model required by the feature enhancement technique. The proposed algorithm fits a Gaussian mixture model to the noise by iteratively maximising the likelihood of the noisy speech signal so that noise can be estimated even during speech-dominating frames. A comprehensive set of experiments carried out on the Aurora-2 and Aurora-4 databases shows that the proposed method achieves significant improvements over the baseline system and other similar missing data imputation techniques.

引用

页码：3731 / 3760

页数：29

共 50 条

[41] Vector Taylor Series Expansion with Auditory Masking for Noise Robust Speech Recognition
Das, Biswajit
Panda, Ashish
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[42] Sequential noise estimation with optimal forgetting for robust speech recognition
Afify, M
Siohan, O
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 229 - 232
[43] Speech enhancement based on sequential noise estimation with a masking property
[J]. 1600, Acta Press (34):
[44] Histogram equalization with Bayesian estimation for noise robust speech recognition
[J]. 1600, Acoustical Society of America (143):
[45] Histogram equalization with Bayesian estimation for noise robust speech recognition
Suh, Youngjoo
Kim, Hoirin
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (02): : 677 - 685
[46] INTEGRATED DNN-BASED MODEL ADAPTATION TECHNIQUE FOR NOISE-ROBUST SPEECH RECOGNITION
Lee, Kang Hyun
Kang, Woo Hyun
Kang, Tae Gyoon
Kim, Nam Soo
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5245 - 5249
[47] Noise adaptive speech recognition based on sequential noise parameter estimation
Yao, KS
Paliwal, KK
Nakamura, S
[J]. SPEECH COMMUNICATION, 2004, 42 (01) : 5 - 23
[48] Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition
Liao, Hsien-Cheng
Liao, Yuan-Fu
Lee, Chin-Hui
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 480 - +
[49] APPROXIMATED PARALLEL MODEL COMBINATION FOR EFFICIENT NOISE-ROBUST SPEECH RECOGNITION
Sim, Khe Chai
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7383 - 7387
[50] A Robust Estimation Method of Noise Mixture Model for Noise Suppression
Fujimoto, Masakiyo
Watanabe, Shinji
Nakatani, Tomohiro
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 704 - 707

← 1 2 3 4 5 →