Variance based time-frequency mask estimation for unsupervised speech enhancement

被引:4
|
作者
Saleem, Nasir [1 ,2 ]
Khattak, Muhammad Irfan [2 ]
Witjaksono, Gunawan [3 ]
Ahmad, Gulzar [2 ]
机构
[1] Gomal Univ, Fac Engn & Technol, Dept Elect Engn, Dera Ismail Khan 29050, Pakistan
[2] Univ Engn & Technol, Dept Elect Engn, Peshawar 25000, Pakistan
[3] Univ Teknol PETRONAS, Dept Elect & Elect Engn, Seri Iskandar, Malaysia
关键词
A priori SNR estimation; Speech enhancement; Time-frequency masking; Variance-based features; Wiener gain; Intelligibility; Speech quality; NOISE-ESTIMATION; RESIDUAL NOISE; ACOUSTIC NOISE; BINARY; REDUCTION; ALGORITHM; NETWORKS; SPECTRUM;
D O I
10.1007/s11042-019-08032-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech.
引用
收藏
页码:31867 / 31891
页数:25
相关论文
共 50 条
  • [1] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    [J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [2] Noise estimation based on time-frequency correlation for speech enhancement
    Yuan, Wenhao
    Lin, Jiajun
    An, Wei
    Wang, Yu
    Chen, Ning
    [J]. APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
  • [3] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
  • [4] Time-frequency mask estimation-based speech enhancement using deep encoder-decoder neural network
    SHI Wenhua
    ZHANG Xiongwei
    ZOU Xia
    SUN Meng
    LI Li
    REN Zhengbing
    [J]. Chinese Journal of Acoustics, 2021, 40 (01) : 141 - 154
  • [5] Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
    Shah, Neil
    Patil, Hemant A.
    Soni, Meet H.
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1246 - 1251
  • [6] Binaural Speech Separation Based on the Time-Frequency Binary Mask
    Mahmoodzadeh, A.
    Abutalebi, H. R.
    Soltanian-Zadeh, H.
    Sheikhzadeh, H.
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 848 - 853
  • [7] Speech Enhancement in Low SNR Environments by Designing a Time-Frequency Binary Mask
    Cheng, Shuai
    Zhang, Haijian
    Hua, Guang
    [J]. 2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [8] SPEECH ENHANCEMENT BASED ON JOINT TIME-FREQUENCY SEGMENTATION
    Tantibundhit, C.
    Pernkopf, F.
    Kubin, G.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4673 - +
  • [9] Speech Feature Enhancement based on Time-frequency Analysis
    Do, Duc-Hao
    Chau, Thanh-Duc
    Tran, Thai-Son
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [10] A new time-frequency binary mask estimation method based on convex optimization of speech power
    Bao, Feng
    Abdulla, Waleed H.
    [J]. SPEECH COMMUNICATION, 2018, 97 : 51 - 65