Speech mask estimation using the time-frequency correlation of speech presence

被引：0

作者：

Zhan, Ge ^{[1
]}

Huang, Zhao-Qiong ^{[1
]}

Ying, Dong-Wen ^{[1
]}

Pan, Jie-Lin ^{[1
]}

Yan, Yong-Hong ^{[1
]}

机构：

[1] Institute of Acoustics, The Chinese Academy of Sciences, Beijing,100190, China

来源：

Ruan Jian Xue Bao/Journal of Software | 2016年 / 27卷

基金：

中国国家自然科学基金;

关键词：

Frequency correlation - Neighbor factor - On-line estimation - Posteriori probability - State transition probabilities - Time frequency - Time frequency domain - Two Dimensional (2 D);

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper proposes a method to estimate the spectrographic speech mask based on a two-dimensional (2-D) correlation model. The proposed method is motivated by a fact that the time and frequency correlations of speech presence are interwoven with each other in the time-frequency domain. Conventional Markov chain is incapable of simultaneously modeling the time and frequency correlations in an adaptive way. The 2-D correlation model is presented to describe the correlation of speech presence in the TF domain, where the speech presence and absence are taken as two states of the model. The time correlation is modeled by the time state-transition probability and the forward factor, while the frequency state-transition probability and the corresponding neighbor factor are defined to describe the frequency correlation. The time and frequency correlations are incorporated into the model by maximizing the Q-function. A sequential scheme is presented to online estimate the parameter set. Given the observed spectrum and the parameter set, the state matrix that maximizes the posteriori probability is regarded as the optimal estimate of the speech mask. The proposed method was compared with some well-established methods. The experimental results confirmed its superiority. © Copyright 2016, Institute of Software, the Chinese Academy of Sciences. All rights reserved.

引用

页码：64 / 68

共 50 条

[1] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
Zhan, Ge
Huang, Zhaoqiong
Ying, Dongwen
Pan, Jielin
Yan, Yonghong
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
[2] Variance based time-frequency mask estimation for unsupervised speech enhancement
Nasir Saleem
Muhammad Irfan Khattak
Gunawan Witjaksono
Gulzar Ahmad
[J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
[3] Variance based time-frequency mask estimation for unsupervised speech enhancement
Saleem, Nasir
Khattak, Muhammad Irfan
Witjaksono, Gunawan
Ahmad, Gulzar
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
[4] Noise estimation based on time-frequency correlation for speech enhancement
Yuan, Wenhao
Lin, Jiajun
An, Wei
Wang, Yu
Chen, Ning
[J]. APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
[5] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
Xiao, Xiong
Zhao, Shengkui
Jones, Douglas L.
Chng, Eng Siong
Li, Haizhou
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
[6] Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation
Blin, A
Araki, S
Makino, S
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1693 - 1700
[7] Binaural Speech Separation Based on the Time-Frequency Binary Mask
Mahmoodzadeh, A.
Abutalebi, H. R.
Soltanian-Zadeh, H.
Sheikhzadeh, H.
[J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 848 - 853
[8] Speech presence detection in the time-frequency domain using minimum statistics
Sorensen, KV
Andersen, SV
[J]. NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 340 - 343
[9] SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON INTEGRATED TIME-FREQUENCY MINIMUM TRACKING FOR SPEECH ENHANCEMENT IN ADVERSE ENVIRONMENTS
Fu, Zhong-Hua
Wang, Jhing-Fa
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4258 - 4261
[10] Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation
Li, Chao
Liu, Wenju
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1208 - 1211

← 1 2 3 4 5 →