Speech mask estimation using the time-frequency correlation of speech presence

被引:0
|
作者
Zhan, Ge [1 ]
Huang, Zhao-Qiong [1 ]
Ying, Dong-Wen [1 ]
Pan, Jie-Lin [1 ]
Yan, Yong-Hong [1 ]
机构
[1] Institute of Acoustics, The Chinese Academy of Sciences, Beijing,100190, China
来源
基金
中国国家自然科学基金;
关键词
Frequency correlation - Neighbor factor - On-line estimation - Posteriori probability - State transition probabilities - Time frequency - Time frequency domain - Two Dimensional (2 D);
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes a method to estimate the spectrographic speech mask based on a two-dimensional (2-D) correlation model. The proposed method is motivated by a fact that the time and frequency correlations of speech presence are interwoven with each other in the time-frequency domain. Conventional Markov chain is incapable of simultaneously modeling the time and frequency correlations in an adaptive way. The 2-D correlation model is presented to describe the correlation of speech presence in the TF domain, where the speech presence and absence are taken as two states of the model. The time correlation is modeled by the time state-transition probability and the forward factor, while the frequency state-transition probability and the corresponding neighbor factor are defined to describe the frequency correlation. The time and frequency correlations are incorporated into the model by maximizing the Q-function. A sequential scheme is presented to online estimate the parameter set. Given the observed spectrum and the parameter set, the state matrix that maximizes the posteriori probability is regarded as the optimal estimate of the speech mask. The proposed method was compared with some well-established methods. The experimental results confirmed its superiority. © Copyright 2016, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:64 / 68
相关论文
共 50 条
  • [1] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
  • [2] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    [J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [3] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Witjaksono, Gunawan
    Ahmad, Gulzar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
  • [4] Noise estimation based on time-frequency correlation for speech enhancement
    Yuan, Wenhao
    Lin, Jiajun
    An, Wei
    Wang, Yu
    Chen, Ning
    [J]. APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
  • [5] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Zhao, Shengkui
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
  • [6] Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation
    Blin, A
    Araki, S
    Makino, S
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1693 - 1700
  • [7] Binaural Speech Separation Based on the Time-Frequency Binary Mask
    Mahmoodzadeh, A.
    Abutalebi, H. R.
    Soltanian-Zadeh, H.
    Sheikhzadeh, H.
    [J]. 2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 848 - 853
  • [8] Speech presence detection in the time-frequency domain using minimum statistics
    Sorensen, KV
    Andersen, SV
    [J]. NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 340 - 343
  • [9] SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON INTEGRATED TIME-FREQUENCY MINIMUM TRACKING FOR SPEECH ENHANCEMENT IN ADVERSE ENVIRONMENTS
    Fu, Zhong-Hua
    Wang, Jhing-Fa
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4258 - 4261
  • [10] Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation
    Li, Chao
    Liu, Wenju
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1208 - 1211