Mask estimation incorporating phase-sensitive information for speech enhancement

被引:8
|
作者
Wang, Xianyun [1 ]
Bao, Changchun [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; Phase-sensitive; Mask estimation; MAP; Deep neural network; PARAMETER-ESTIMATION; NOISE; SEPARATION; FEATURES; DATABASE;
D O I
10.1016/j.apacoust.2019.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For deep neural network (DNN)-based methods, the time-frequency (T-F) masks are commonly used as the training target. However, most of them do not focus on the phase information, while recent studies have revealed that incorporating phase information into the T-F mask can effectively help improve the speech quality of the enhanced speech. In this paper, we present two techniques to obtain the T-F mask considering phase information. In the first technique, the characteristics about spectral structures of two phase differences, which include the phase difference (PD) between clean and noisy speech and the PD between noise and noisy speech, are firstly discussed. Then, considering the specific characteristics of two PDs, a parametric ideal ratio mask (IRM) whose parameters are controlled by the cosines of the two aforementioned PDs is proposed, which is termed as a bounded IRM with phase constraint (BIRMP). In the second technique, an optimal estimator based on generalized maximum a posteriori (GMAP) probability of complex speech spectrum is proposed and defined as an optimal GMAP estimation of complex spectrum (OGMAPC). The OGMAPC estimator can dynamically adjust the scale of prior information of spectral magnitude and phase. Considering the difficult predictability of speech phase in the DNN-based method, the second technique exploits the spectral magnitude part of the OGMAPC estimator to calculate an optimal magnitude mask with the phase information and its ideal value is used for DNN training. The experiments show that the proposed methods can outperform the reference methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 50 条
  • [21] Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement
    Pfeifenberger, Lukas
    Zoehrer, Matthias
    Pernkopf, Franz
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2162 - 2172
  • [22] A deep neural network-correlation phase sensitive mask based estimation to improve speech intelligibility
    Sivapatham, Shoba
    Kar, Asutosh
    Bodile, Roshan
    Mladenovic, Vladimir
    Sooraksa, Pitikhate
    [J]. APPLIED ACOUSTICS, 2023, 212
  • [23] PHASE-SENSITIVE RECTIFIER
    POPOV, VS
    DZHANGOZIN, AD
    [J]. MEASUREMENT TECHNIQUES USSR, 1982, 25 (01): : 80 - 83
  • [24] The phase-sensitive detector
    Slifkin, M
    Schlesinger, A
    [J]. ELECTRONICS WORLD, 1999, 105 (1756): : 312 - 319
  • [25] Quantum enhancement of a coherent LADAR receiver using phase-sensitive amplification
    Wasilousky, Peter A.
    Smith, Kevin H.
    Glasser, Ryan
    Burdge, Geoffrey L.
    Burberry, Lee
    Deibner, Bill
    Silver, Michael
    Peach, Robert C.
    Visone, Christopher
    Kumar, Prem
    Lim, Oo-Kaw
    Alon, Gideon
    Chen, Chao-Hsiang
    Bhagwat, Amar R.
    Manurkar, Paritosh
    Vasilyev, Michael
    Annamalia, Muthiah
    Stelmakh, Nikolai
    Dutton, Zachary
    Guha, Saikat
    Santivanez, Cesar
    Chen, Jian
    Silva, Marcus
    Kelly, Will
    Shapiro, Jeffrey H.
    Nair, Ranjith
    Yen, Brent J.
    Wong, Franco N. C.
    [J]. QUANTUM COMMUNICATIONS AND QUANTUM IMAGING IX, 2011, 8163
  • [26] Noise evolution with the phase-sensitive gain in a hybrid fiber phase-sensitive amplifier
    Liu, Zhanchang
    Chen, Zhirong
    Guo, Xiaojie
    Du, Jiangbing
    Li, Zhaohui
    [J]. OPTICS LETTERS, 2020, 45 (11) : 3075 - 3078
  • [27] A Mask Estimation Method Integrating Data Field Model for Speech Enhancement
    Wang, Xianyun
    Bao, Changchun
    Bao, Feng
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1904 - 1908
  • [28] Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information
    Mowlaee, Pejman
    Kulmer, Josef
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1521 - 1532
  • [29] An analytic derivation of a phase-sensitive observation model for noise robust speech recognition
    Leutnant, Volker
    Haeb-Umbach, Reinhold
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2375 - 2378
  • [30] Vector method for strain estimation in phase-sensitive optical coherence elastography
    Matveyev, A. L.
    Matveev, L. A.
    Sovetsky, A. A.
    Gelikonov, G., V
    Moiseev, A. A.
    Zaitsev, V. Y.
    [J]. LASER PHYSICS LETTERS, 2018, 15 (06)