Mask estimation incorporating phase-sensitive information for speech enhancement

被引:8
|
作者
Wang, Xianyun [1 ]
Bao, Changchun [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; Phase-sensitive; Mask estimation; MAP; Deep neural network; PARAMETER-ESTIMATION; NOISE; SEPARATION; FEATURES; DATABASE;
D O I
10.1016/j.apacoust.2019.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For deep neural network (DNN)-based methods, the time-frequency (T-F) masks are commonly used as the training target. However, most of them do not focus on the phase information, while recent studies have revealed that incorporating phase information into the T-F mask can effectively help improve the speech quality of the enhanced speech. In this paper, we present two techniques to obtain the T-F mask considering phase information. In the first technique, the characteristics about spectral structures of two phase differences, which include the phase difference (PD) between clean and noisy speech and the PD between noise and noisy speech, are firstly discussed. Then, considering the specific characteristics of two PDs, a parametric ideal ratio mask (IRM) whose parameters are controlled by the cosines of the two aforementioned PDs is proposed, which is termed as a bounded IRM with phase constraint (BIRMP). In the second technique, an optimal estimator based on generalized maximum a posteriori (GMAP) probability of complex speech spectrum is proposed and defined as an optimal GMAP estimation of complex spectrum (OGMAPC). The OGMAPC estimator can dynamically adjust the scale of prior information of spectral magnitude and phase. Considering the difficult predictability of speech phase in the DNN-based method, the second technique exploits the spectral magnitude part of the OGMAPC estimator to calculate an optimal magnitude mask with the phase information and its ideal value is used for DNN training. The experiments show that the proposed methods can outperform the reference methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 50 条
  • [41] Data compression and SNR enhancement with compressive sensing method in phase-sensitive OTDR
    Qu, Shuai
    Chang, Jun
    Cong, Zhenhua
    Chen, Hui
    Qin, Zengguang
    [J]. OPTICS COMMUNICATIONS, 2019, 433 : 97 - 103
  • [42] PHASE-SENSITIVE SLIDING RECTIFICATION
    FEDOROV, IM
    [J]. MEASUREMENT TECHNIQUES USSR, 1989, 32 (03): : 227 - 228
  • [43] PHASE-SENSITIVE OPTICAL AMPLIFIER
    MATTHYS, DR
    JAYNES, ET
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1980, 70 (03) : 263 - 267
  • [44] AN IMPROVED PHASE-SENSITIVE DETECTOR
    WILLIAMS, P
    [J]. JOURNAL OF SCIENTIFIC INSTRUMENTS, 1965, 42 (07): : 474 - &
  • [45] CHARACTERIZATION OF A PHASE-SENSITIVE DETECTOR
    EVANS, WA
    SYKES, AM
    [J]. IEE PROCEEDINGS-G CIRCUITS DEVICES AND SYSTEMS, 1989, 136 (05): : 285 - 292
  • [46] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    [J]. Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [47] Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
    Wang, Xianyun
    Bao, Changchun
    [J]. INTERSPEECH 2019, 2019, : 3188 - 3192
  • [48] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Witjaksono, Gunawan
    Ahmad, Gulzar
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
  • [49] Phase-sensitive neutron reflectometry
    Majkrzak, CF
    Berk, NF
    Perez-Salas, UA
    [J]. LANGMUIR, 2003, 19 (19) : 7796 - 7810
  • [50] Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement
    S. Balasubramanian
    R. Rajavel
    Asutosh Kar
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 5313 - 5337