A CONVEX OPTIMIZATION APPROACH FOR TIME-FREQUENCY MASK ESTIMATION

被引:0
|
作者
Bao, Feng [1 ]
Abdulla, Waleed H. [1 ]
机构
[1] Univ Auckland, Elect & Comp Engn Dept, 20 Symond St, Auckland 1010, New Zealand
关键词
Computational auditory scene analysis (CASA); Ideal binary mask (IBM); Convex optimization; Speech enhancement; SPEECH; NOISE; ENHANCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new time-frequency mask method for computational auditory scene analysis (CASA) based on convex optimization of the binary mask. In the proposed method, the pitch estimation and segment segregation in conventional CASA are completely replaced by the convex optimization of speech power. Considering the cross-correlation between the power spectra of noisy speech and noise in each of a Gammatone filterbank channel, the objective function of speech power used for convex optimization is built. The speech power is estimated by gradient descent method. Thus, the time-frequency units dominated by speech and noise are labeled by comparing the powers of noisy and estimated speech, and noise. The erroneous local masks are also removed by using the Teager energy of the estimated speech and time-frequency unit smoothing. The results from the average segmental signal-to-noise ratio improvement, HIT-False Alarm rate and subjective test show that the performance of the proposed method outperforms the reference methods.
引用
收藏
页码:31 / 35
页数:5
相关论文
共 50 条
  • [1] A new time-frequency binary mask estimation method based on convex optimization of speech power
    Bao, Feng
    Abdulla, Waleed H.
    [J]. SPEECH COMMUNICATION, 2018, 97 : 51 - 65
  • [2] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    [J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [3] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Witjaksono, Gunawan
    Ahmad, Gulzar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
  • [4] A data-driven approach for estimating the time-frequency binary mask
    Kim, Gibak
    Loizou, Philipos C.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 884 - 887
  • [5] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
  • [6] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Zhao, Shengkui
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
  • [7] Carrier Frequency Estimation of Time-Frequency Overlapped MASK Signals for Underlay Cognitive Radio Network
    Liu, Mingqian
    Zhang, Junlin
    Lin, Yun
    Wu, Zhen
    Shang, Bodong
    Gong, Fengkui
    [J]. IEEE ACCESS, 2019, 7 : 58277 - 58285
  • [8] DIRECTION OF ARRIVAL ESTIMATION IN HIGHLY REVERBERANT ENVIRONMENTS USING SOFT TIME-FREQUENCY MASK
    Tourbabin, Vladimir
    Donley, Jacob
    Rafaely, Boaz
    Mehra, Ravish
    [J]. 2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 383 - 387
  • [9] AUGMENTED TIME-FREQUENCY MASK ESTIMATION IN CLUSTER-BASED SOURCE SEPARATION ALGORITHMS
    Luo, Yi
    Mesgarani, Nima
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 710 - 714
  • [10] TIME DELAY ESTIMATION IN THE TIME-FREQUENCY DOMAIN BASED ON A LINE DETECTION APPROACH
    Sandmair, Andreas
    Lietz, Mario
    Stefan, Johannes
    Leon, Fernando Puente
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2716 - 2719