Sparse Hidden Markov Models for Speech Enhancement in Non-Stationary Noise Environments

被引:28
|
作者
Deng, Feng [1 ]
Bao, Changchun [1 ]
Kleijn, W. Bastiaan [2 ]
机构
[1] Beijing Univ Technol, Sch Elect Informat & Control Engn, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
[2] Victoria Univ Wellington, Sch Engn & Comp Sci, Commun & Signal Proc Grp, Wellington 6140, New Zealand
基金
中国国家自然科学基金;
关键词
Gain modeling; non-stationary noise; sparse autoregressive hidden Markov model (ARHMM); speech enhancement; HMM; RECOGNITION;
D O I
10.1109/TASLP.2015.2458585
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a sparse hidden Markov model (HMM)-based single- channel speech enhancement method that models the speech and noise gains accurately in non- stationary noise environments. Autoregressive models are employed to describe the speech and noise in a unified framework and the speech and noise gains are modeled as random processes with memory. The likelihood criterion for finding the model parameters is augmented with an l(p) regularization term resulting in a sparse autoregressive HMM (SARHMM) system that encourages sparsity in the speech- and noise- modeling. In the SARHMM only a small number of HMM states contribute significantly to the model of each particular observed speech segment. As it eliminates ambiguity between noise and speech spectra, the sparsity of speech and noise modeling helps to improve the tracking of the changes of both spectral shapes and power levels of non-stationary noise. Using the modeled speech and noise SARHMMs, we first construct a noise estimator to estimate the noise power spectrum. Then, a Bayesian speech estimator is derived to obtain the enhanced speech signal. The subjective and objective test results indicate that the proposed speech enhancement scheme can achieve a larger segmental SNR improvement, a lower log- spectral distortion and a better speech quality in stationary noise conditions than state-of-the-art reference methods. The advantage of the new method is largest for non-stationary noise conditions.
引用
收藏
页码:1973 / 1987
页数:15
相关论文
共 50 条
  • [1] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [2] SPARSE HMM-BASED SPEECH ENHANCEMENT METHOD FOR STATIONARY AND NON-STATIONARY NOISE ENVIRONMENTS
    Deng, Feng
    Bao, Chang-chun
    Kleijn, W. Bastiaan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5073 - 5077
  • [3] Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments
    Singh, Sachin
    Tripathy, Manoj
    Anand, R. S.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 545 - 555
  • [4] Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments
    Liu, Gang
    Dimitriadis, Dimitrios
    Bocchieri, Enrico
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3016 - 3020
  • [5] Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments
    Malah, D
    Cox, RV
    Accardi, AJ
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 789 - 792
  • [6] Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments
    Duan, Zhiyao
    Mysore, Gautham J.
    Smaragdis, Paris
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 594 - 597
  • [7] A Novel Expectation-Maximization Framework for Speech Enhancement in Non-Stationary Noise Environments
    Lun, Daniel P. K.
    Shen, Tak-Wai
    Ho, K. C.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (02) : 335 - 346
  • [8] Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech
    Norholm, Sidsel Marie
    Jensen, Jesper Rindom
    Christensen, Mads Grsboll
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 645 - 658
  • [9] Speech Enhancement in Non-Stationary Noise Using Compressive Sensing
    Sulong, Amart
    Gunawan, Teddy Surya
    Khalifa, Othman O.
    Kartiwi, Mira
    [J]. PROCEEDINGS OF 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE 2016), 2016, : 489 - 493
  • [10] Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement
    Mai, Van-Khanh
    Pastor, Dominique
    Aissa-El-Bey, Abdeldjalil
    Le-Bidan, Raphael
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 670 - 682