A Novel Expectation-Maximization Framework for Speech Enhancement in Non-Stationary Noise Environments

被引：7

作者：

Lun, Daniel P. K. ^{[1
]}

Shen, Tak-Wai ^{[1
]}

Ho, K. C. ^{[2
]}

机构：

[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Ctr Signal Proc, Hong Kong, Hong Kong, Peoples R China

[2] Univ Missouri, Dept Elect & Comp Engn, Columbia, MO 65211 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 02期

关键词：

Cepstral analysis; expectation-maximization; speech enhancement; IMAGE-RECONSTRUCTION; EM ALGORITHM; GAIN;

D O I：

10.1109/TASLP.2013.2290497

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voiced speeches have a quasi-periodic nature that allows them to be compactly represented in the cepstral domain. It is a distinctive feature compared with noises. Recently, the temporal cepstrum smoothing (TCS) algorithm was proposed and was shown to be effective for speech enhancement in non-stationary noise environments. However, the missing of an automatic parameter updating mechanism limits its adaptability to noisy speeches with abrupt changes in SNR across time frames or frequency components. In this paper, an improved speech enhancement algorithm based on a novel expectation-maximization (EM) framework is proposed. The new algorithm starts with the traditional TCS method which gives the initial guess of the periodogram of the clean speech. It is then applied to an norm regularizer in the M-step of the EM framework to estimate the true power spectrum of the original speech. It in turn enables the estimation of the a-priori SNR and is used in the E-step, which is indeed a logmmse gain function, to refine the estimation of the clean speech periodogram. The M-step and E-step iterate alternately until converged. A notable improvement of the proposed algorithm over the traditional TCS method is its adaptability to the changes (even abrupt changes) in SNR of the noisy speech. Performance of the proposed algorithm is evaluated using standard measures based on a large set of speech and noise signals. Evaluation results show that a significant improvement is achieved compared to conventional approaches especially in non-stationary noise environment where most conventional algorithms fail to perform.

引用

页码：335 / 346

页数：12

共 50 条

[41] A HIGHLY NON-STATIONARY NOISE TRACKING AND COMPENSATION ALGORITHM, WITH APPLICATIONS TO SPEECH ENHANCEMENT AND ON-LINE ASR
Chowdhury, Md Foezur Rahman
Selouani, Sid-Ahmed
O'Shaughnessy, Douglas
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4337 - 4340
[42] Multi notch adaptive digital filter design for enhancement of speech signals embedded in non-stationary noise
Erçelebi, E
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2004, 30 (02) : 79 - 95
[43] An expectation-maximization framework for comprehensive prediction of isoform-specific functions
Karlebach, Guy
Carmody, Leigh
Sundaramurthi, Jagadish Chandrabose
Casiraghi, Elena
Hansen, Peter
Reese, Justin
Mungall, Christopher J.
Valentini, Giorgio
Robinson, Peter N.
[J]. BIOINFORMATICS, 2023, 39 (04)
[44] Regularizing CTC in Expectation-Maximization Framework with Application to Handwritten Text Recognition
Gao, Likun
Zhang, Heng
Li, Cheng-Lin
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[45] MULTI-MICROPHONE SPEECH DEREVERBERATION USING EXPECTATION-MAXIMIZATION AND KALMAN SMOOTHING
Schwartz, Boaz
Gannot, Sharon
Habets, Emanuel A. P.
[J]. 2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
[46] AN EXPECTATION-MAXIMIZATION ALGORITHM FOR MULTICHANNEL ADAPTIVE SPEECH DEREVERBERATION IN THE FREQUENCY-DOMAIN
Schmid, Dominic
Malik, Sarmad
Enzner, Gerald
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 17 - 20
[47] Speech detection in non-stationary noise based on the 1/f process
Wang, F
Zheng, F
Wu, WH
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (01): : 83 - 89
[48] Modelling non-stationary noise with spectral factorisation in automatic speech recognition
Hurmalainen, Antti
Gemmeke, Jort F.
Virtanen, Tuomas
[J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 763 - 779
[49] Speech Estimation in Non-Stationary Noise Environments Using Timing Structures between Mouth Movements and Sound Signals
Kawashima, Hiroaki
Horii, Yu
Matsuyama, Takashi
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 442 - 445
[50] Speech detection in non-stationary noise based on the 1/f process
Fan Wang
Fang Zheng
Wenhu Wu
[J]. Journal of Computer Science and Technology, 2002, 17 : 83 - 89

← 1 2 3 4 5 →