ENHANCED TIME-FREQUENCY MASKING BY USING NEURAL NETWORKS FOR MONAURAL SOURCE SEPARATION IN REVERBERANT ROOM ENVIRONMENTS

被引:0
|
作者
Sun, Yang [1 ]
Wang, Wenwu [2 ]
Chambers, Jonathon A. [1 ]
Naqvi, Syed Mohsen [1 ]
机构
[1] Newcastle Univ, Intelligent Sensing & Commun Res Grp, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
关键词
source separation; reverberant room environments; dereverberation; time-frequency mask; SPEECH; RECOGNITION; NOISE;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) have been used for dereverberation and denosing in the monaural source separation problem. However, the performance of current state-of-the-art methods is limited, particularly when applied in highly reverberant room environments. In this paper, we propose an enhanced time-frequency (T-F) mask to improve the separation performance. The ideal enhanced mask (IEM) consists of the dereverberation mask (DM) and the ideal ratio mask (IRM). The DM is specifically applied to eliminate the reverberations in the speech mixture and the IRM helps in denoising. The IEEE and the TIMIT corpora with real room impulse responses (RIRs) and noise from the NOISEX dataset are used to generate speech mixtures for evaluations. The proposed method outperforms the state-of-the-art methods specifically in highly reverberant and noisy room environments.
引用
收藏
页码:1647 / 1651
页数:5
相关论文
共 50 条
  • [21] Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking
    Reju, Vaninirappuputhenpurayil Gopalan
    Koh, Soo Ngee
    Soon, Ing Yann
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01): : 101 - 116
  • [22] Blind speech source separation via nonlinear time-frequency masking
    Xu, Shun
    Chen, Shaorong
    Liu, Yulin
    Shengxue Xuebao/Acta Acustica, 2007, 32 (04): : 375 - 381
  • [23] Time-frequency masking for blind source separation with preserved spatial cues
    Pirhosseinloo, Shadi
    Kokkinakis, Kostas
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1188 - 1192
  • [24] Blind speech source separation via nonlinear time-frequency masking
    XU Shun CHEN Shaorong LIU Yulin (DSP Lab.
    ChineseJournalofAcoustics, 2008, (03) : 203 - 214
  • [25] Unsupervised Learning for Monaural Source Separation Using Maximization-Minimization Algorithm with Time-Frequency Deconvolution
    Woo, Wai Lok
    Gao, Bin
    Bouridane, Ahmed
    Ling, Bingo Wing-Kuen
    Chin, Cheng Siong
    SENSORS, 2018, 18 (05)
  • [26] Separation of Cardiorespiratory Sounds Using Time-Frequency Masking and Sparsity
    Shah, Ghafoor
    Papadias, Constantinos
    2013 18TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2013,
  • [27] INFORMED SOURCE SEPARATION FROM MONAURAL MUSIC WITH LIMITED BINARY TIME-FREQUENCY ANNOTATION
    Jeong, Il-Young
    Lee, Kyogu
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 489 - 493
  • [28] A batch algorithm for blind source separation of acoustic signals using ICA and time-frequency masking
    Hoffmann, Eugen
    Kolossa, Dorothea
    Orglmeister, Reinhold
    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2007, 4666 : 480 - +
  • [29] Stereo audio source separation based on time-frequency masking and multilevel thresholding
    Cobos, Maximo
    Lopez, Jose J.
    DIGITAL SIGNAL PROCESSING, 2008, 18 (06) : 960 - 976
  • [30] Overcomplete blind source separation by combining ICA and binary time-frequency masking
    Pedersen, MHS
    Wang, DL
    Larsen, J
    Kjems, U
    2005 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2005, : 15 - 20