Speech enhancement by Bayesian estimation of clean speech modeled as super Gaussian given a priori knowledge of phase

被引：2

作者：

Vanambathina, Sunnydayal ^{[1
]}

Kumar, T. Kishore ^{[1
]}

机构：

[1] Natl Inst Technol Warangal, Dept Elect & Commun Engn, Warangal 506004, Telangana, India

来源：

SPEECH COMMUNICATION | 2016年 / 77卷

关键词：

ML estimator; MAP estimator; MMSE estimator; Laplace density; von Mises distribution; Nakagami distribution; SPECTRAL MAGNITUDE ESTIMATION; SQUARE ERROR ESTIMATION; NOISE; COEFFICIENTS;

D O I：

10.1016/j.specom.2015.11.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, SIP 1 based speech enhancement algorithms based on estimation of short time spectral amplitudes are proposed. These algorithms use Maximum Likelihood (ML), Maximum a posterior (MAP) and Minimum mean square error (MMSE) estimators which respectively uses Laplace, Gaussian probability density functions (pdf) as noise spectral amplitude priors and Nakagami, Gamma distributions as speech spectral amplitude priors. The method uses a joint MMSE estimate of the clean speech amplitude and clean speech phase for a given uncertainty phase information for improved single channel speech enhancement. In the most of the speech enhancement algorithms, we only concentrate on the frequency domain amplitude of speech, but not on the phase of noisy speech since it may cause undesired artifacts. In this paper, a recent phase reconstruction algorithm is used to estimate the phase of clean speech. The reconstructed phase is treated as an uncertain prior knowledge when deriving a joint MMSE estimate of the Complex speech coefficients given Uncertain Phase (CUP) information. The proposed MMSE optimal CUP estimator reduces undesired artifacts and also gives satisfactory values between the phase of noisy signal and the estimate of prior phase. We evaluate all the above estimators using speech signals uttered by 10 male speakers and 10 female speakers are taken from TIMIT database. The proposed method outperforms other benchmark algorithms in terms of segmental signal to noise ratio (SSNR), short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESO). (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：8 / 27

页数：20

共 50 条

[1] Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase
Sunnydayal V.
Kumar T.K.
International Journal of Speech Technology, 2015, 18 (04) : 593 - 607
[2] Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase
Gerkmann, Timo
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4199 - 4208
[3] Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation
Cohen, I
SPEECH COMMUNICATION, 2005, 47 (03) : 336 - 350
[4] Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
Wang, Xianyun
Bao, Changchun
INTERSPEECH 2019, 2019, : 3188 - 3192
[5] Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model
Thomas Lotter
Peter Vary
EURASIP Journal on Advances in Signal Processing, 2005
[6] Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model
Lotter, T
Vary, P
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (07) : 1110 - 1126
[7] A priori SNR estimation and noise estimation for speech enhancement
Yao, Rui
Zeng, ZeQing
Zhu, Ping
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
[8] A priori SNR estimation and noise estimation for speech enhancement
Rui Yao
ZeQing Zeng
Ping Zhu
EURASIP Journal on Advances in Signal Processing, 2016
[9] IMPROVED A PRIORI SNR ESTIMATION IN SPEECH ENHANCEMENT
Nahma, Lara
Yong, Pei Chee
Dam, Hai Huyen
Nordholm, Sven
2017 23RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC): BRIDGING THE METROPOLITAN AND THE REMOTE, 2017, : 253 - 257
[10] An improved estimation of a priori speech absence probability for speech enhancement: In perspective of speech perception
Choi, MS
Kang, HG
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1117 - 1120

← 1 2 3 4 5 →