Variance based time-frequency mask estimation for unsupervised speech enhancement

被引：4

作者：

Saleem, Nasir ^{[1
,2
]}

Khattak, Muhammad Irfan ^{[2
]}

Witjaksono, Gunawan ^{[3
]}

Ahmad, Gulzar ^{[2
]}

机构：

[1] Gomal Univ, Fac Engn & Technol, Dept Elect Engn, Dera Ismail Khan 29050, Pakistan

[2] Univ Engn & Technol, Dept Elect Engn, Peshawar 25000, Pakistan

[3] Univ Teknol PETRONAS, Dept Elect & Elect Engn, Seri Iskandar, Malaysia

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2019年 / 78卷 / 22期

关键词：

A priori SNR estimation; Speech enhancement; Time-frequency masking; Variance-based features; Wiener gain; Intelligibility; Speech quality; NOISE-ESTIMATION; RESIDUAL NOISE; ACOUSTIC NOISE; BINARY; REDUCTION; ALGORITHM; NETWORKS; SPECTRUM;

D O I：

10.1007/s11042-019-08032-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech.

引用

页码：31867 / 31891

页数：25

共 50 条

[1] Variance based time-frequency mask estimation for unsupervised speech enhancement
Nasir Saleem
Muhammad Irfan Khattak
Gunawan Witjaksono
Gulzar Ahmad
[J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
[2] Noise estimation based on time-frequency correlation for speech enhancement
Yuan, Wenhao
Lin, Jiajun
An, Wei
Wang, Yu
Chen, Ning
[J]. APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
[3] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
Zhan, Ge
Huang, Zhaoqiong
Ying, Dongwen
Pan, Jielin
Yan, Yonghong
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
[4] Time-frequency mask estimation-based speech enhancement using deep encoder-decoder neural network
SHI Wenhua
ZHANG Xiongwei
ZOU Xia
SUN Meng
LI Li
REN Zhengbing
[J]. Chinese Journal of Acoustics, 2021, 40 (01) : 141 - 154
[5] Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
Shah, Neil
Patil, Hemant A.
Soni, Meet H.
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1246 - 1251
[6] Binaural Speech Separation Based on the Time-Frequency Binary Mask
Mahmoodzadeh, A.
Abutalebi, H. R.
Soltanian-Zadeh, H.
Sheikhzadeh, H.
[J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 848 - 853
[7] Speech Enhancement in Low SNR Environments by Designing a Time-Frequency Binary Mask
Cheng, Shuai
Zhang, Haijian
Hua, Guang
[J]. 2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[8] SPEECH ENHANCEMENT BASED ON JOINT TIME-FREQUENCY SEGMENTATION
Tantibundhit, C.
Pernkopf, F.
Kubin, G.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4673 - +
[9] Speech Feature Enhancement based on Time-frequency Analysis
Do, Duc-Hao
Chau, Thanh-Duc
Tran, Thai-Son
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
[10] A new time-frequency binary mask estimation method based on convex optimization of speech power
Bao, Feng
Abdulla, Waleed H.
[J]. SPEECH COMMUNICATION, 2018, 97 : 51 - 65

← 1 2 3 4 5 →