Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

被引:0
|
作者
Xia, Shasha [1 ]
Li, Hao [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
关键词
NOISE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [31] Supervised Speech Separation Based on Deep Learning: An Overview
    Wang, DeLiang
    Chen, Jitong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1702 - 1726
  • [32] RECURRENT DEEP STACKING NETWORKS FOR SUPERVISED SPEECH SEPARATION
    Wang, Zhong-Qiu
    Wang, DeLiang
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 71 - 75
  • [33] Optimal functional supervised classification with separation condition
    Gadat, Sebastien
    Gerchinovitz, Sebastien
    Marteau, Clement
    BERNOULLI, 2020, 26 (03) : 1797 - 1831
  • [34] Investigation of Cost Function for Supervised Monaural Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    Cao, Yuhang
    INTERSPEECH 2019, 2019, : 3178 - 3182
  • [35] Applications of Deep I earning in Supervised Speech Separation
    Bai, Shuangran
    Liu, Yungang
    Zhang, Ting
    Li, Fengzhong
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 6539 - 6544
  • [36] On Synthesis for Supervised Monaural Speech Separation in Time Domain
    Chen, Jingjing
    Mao, Qirong
    Liu, Dons
    INTERSPEECH 2020, 2020, : 2627 - 2631
  • [37] Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies
    Prasanna Kumar M.K.
    Kumaraswamy R.
    International Journal of Speech Technology, 2015, 18 (4) : 649 - 662
  • [38] Optimal filtering and smoothing for speech recognition using a stochastic target model
    Ramsay, G
    Deng, L
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1113 - 1116
  • [39] ANALYSIS OF IMPACT OF EMOTIONS ON TARGET SPEECH EXTRACTION AND SPEECH SEPARATION
    Svec, Jan
    Zmolikova, Katerina
    Kocour, Martin
    Delcroix, Marc
    Ochiai, Tsubasa
    Mosner, Ladislav
    Cernocky, Jan ''Honza''
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [40] Semi-Supervised Speech Recognition Acoustic Model Training Using Policy Gradient
    Chung, Hoon
    Lee, Sung Joo
    Jeon, Hyeong Bae
    Park, Jeon Gue
    APPLIED SCIENCES-BASEL, 2020, 10 (10):