Mask estimation incorporating phase-sensitive information for speech enhancement

被引:8
|
作者
Wang, Xianyun [1 ]
Bao, Changchun [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; Phase-sensitive; Mask estimation; MAP; Deep neural network; PARAMETER-ESTIMATION; NOISE; SEPARATION; FEATURES; DATABASE;
D O I
10.1016/j.apacoust.2019.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For deep neural network (DNN)-based methods, the time-frequency (T-F) masks are commonly used as the training target. However, most of them do not focus on the phase information, while recent studies have revealed that incorporating phase information into the T-F mask can effectively help improve the speech quality of the enhanced speech. In this paper, we present two techniques to obtain the T-F mask considering phase information. In the first technique, the characteristics about spectral structures of two phase differences, which include the phase difference (PD) between clean and noisy speech and the PD between noise and noisy speech, are firstly discussed. Then, considering the specific characteristics of two PDs, a parametric ideal ratio mask (IRM) whose parameters are controlled by the cosines of the two aforementioned PDs is proposed, which is termed as a bounded IRM with phase constraint (BIRMP). In the second technique, an optimal estimator based on generalized maximum a posteriori (GMAP) probability of complex speech spectrum is proposed and defined as an optimal GMAP estimation of complex spectrum (OGMAPC). The OGMAPC estimator can dynamically adjust the scale of prior information of spectral magnitude and phase. Considering the difficult predictability of speech phase in the DNN-based method, the second technique exploits the spectral magnitude part of the OGMAPC estimator to calculate an optimal magnitude mask with the phase information and its ideal value is used for DNN training. The experiments show that the proposed methods can outperform the reference methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 50 条
  • [1] Phase-sensitive Speech Enhancement for Cochlear Implant Processing
    Jafari, Pourya S.
    Kang, Hou-Yong
    Wang, Xiaosong
    Fu, Qian-Jie
    Jiang, Hui
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5104 - 5107
  • [2] Speech Enhancement With Phase Sensitive Mask Estimation Using a Novel Hybrid Neural Network
    Hasannezhad, Mojtaba
    Ouyang, Zhiheng
    Zhu, Wei-Ping
    Champagne, Benoit
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2021, 2 : 136 - 150
  • [3] Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement
    Devi Sowjanya
    Shoba Sivapatham
    Asutosh Kar
    Vladimir Mladenovic
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 4117 - 4135
  • [4] Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement
    Sowjanya, Devi
    Sivapatham, Shoba
    Kar, Asutosh
    Mladenovic, Vladimir
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (07) : 4117 - 4135
  • [5] Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement
    Lee, Jinkyu
    Skoglund, Jan
    Shabestary, Turaj
    Kang, Hong-Goo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (08) : 1276 - 1280
  • [6] Speech Enhancement Method with Geometric Phase Estimation By Incorporating MIXMAX Model
    Wang, Xianyun
    Bao, Changchun
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [7] Optical Resolution Enhancement with Phase-Sensitive Preamplification
    Lim, Oo-Kaw
    Alon, Gideon
    Dutton, Zachary
    Guha, Saikat
    Vasilyev, Miochael
    Kumar, Prem
    [J]. 2010 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO) AND QUANTUM ELECTRONICS AND LASER SCIENCE CONFERENCE (QELS), 2010,
  • [8] Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement
    Ou, Shifeng
    Song, Peng
    Gao, Ying
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (08)
  • [9] Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement
    Stouten, V
    Van Hamme, H
    Wambacq, P
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 433 - 436
  • [10] Incorporating Broad Phonetic Information for Speech Enhancement
    Lu, Yen-Ju
    Liao, Chien-Feng
    Lu, Xugang
    Hung, Jeih-weih
    Tsao, Yu
    [J]. INTERSPEECH 2020, 2020, : 2417 - 2421