Variance based time-frequency mask estimation for unsupervised speech enhancement

被引:4
|
作者
Saleem, Nasir [1 ,2 ]
Khattak, Muhammad Irfan [2 ]
Witjaksono, Gunawan [3 ]
Ahmad, Gulzar [2 ]
机构
[1] Gomal Univ, Fac Engn & Technol, Dept Elect Engn, Dera Ismail Khan 29050, Pakistan
[2] Univ Engn & Technol, Dept Elect Engn, Peshawar 25000, Pakistan
[3] Univ Teknol PETRONAS, Dept Elect & Elect Engn, Seri Iskandar, Malaysia
关键词
A priori SNR estimation; Speech enhancement; Time-frequency masking; Variance-based features; Wiener gain; Intelligibility; Speech quality; NOISE-ESTIMATION; RESIDUAL NOISE; ACOUSTIC NOISE; BINARY; REDUCTION; ALGORITHM; NETWORKS; SPECTRUM;
D O I
10.1007/s11042-019-08032-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech.
引用
收藏
页码:31867 / 31891
页数:25
相关论文
共 50 条
  • [31] INVERTIBLE DNN-BASED NONLINEAR TIME-FREQUENCY TRANSFORM FOR SPEECH ENHANCEMENT
    Lakeuchi, Daiki
    Yatabe, Kohei
    Koizumi, Yuma
    Oikawa, Yasuhiro
    Harada, Noboru
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6644 - 6648
  • [32] Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation
    Blin, A
    Araki, S
    Makino, S
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1693 - 1700
  • [33] AUGMENTED TIME-FREQUENCY MASK ESTIMATION IN CLUSTER-BASED SOURCE SEPARATION ALGORITHMS
    Luo, Yi
    Mesgarani, Nima
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 710 - 714
  • [34] SIMULTANEOUS OPTIMIZATION OF FORGETTING FACTOR AND TIME-FREQUENCY MASK FOR BLOCK ONLINE MULTI-CHANNEL SPEECH ENHANCEMENT
    Togami, Masahito
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2702 - 2706
  • [35] Joint Time-Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement
    Tantibundhit, Charturong
    Pernkopf, Franz
    Kubin, Gernot
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1417 - 1428
  • [36] Speech enhancement with natural sounding residual noise based on connected time-frequency speech presence regions
    Sorensen, KV
    Andersen, SV
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (18) : 2954 - 2964
  • [37] An Effective Target Speech Enhancement with Single Acoustic Vector Sensor Based on the Speech Time-Frequency Sparsity
    Zou, Y. X.
    Wang, Y. Q.
    Wang, Peng
    Ritz, C. H.
    Xi, Jiangtao
    [J]. 2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 547 - 551
  • [38] Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions
    Karsten Vandborg Sørensen
    Søren Vang Andersen
    [J]. EURASIP Journal on Advances in Signal Processing, 2005
  • [39] PHASE RECONSTRUCTION METHOD BASED ON TIME-FREQUENCY DOMAIN HARMONIC STRUCTURE FOR SPEECH ENHANCEMENT
    Wakabayashi, Yukoh
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5560 - 5564
  • [40] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043