Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

被引:0
|
作者
Xia, Shasha [1 ]
Li, Hao [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
关键词
NOISE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [41] Lithuanian Broadcast Speech Transcription using Semi-supervised Acoustic Model Training
    Lileikyte, Rasa
    Gorin, Arseniy
    Lamel, Lori
    Gauvain, Jean-Luc
    Fraga-Silva, Thiago
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 107 - 113
  • [42] Single-Channel Speech. Separation Using Phase Model Based Soft Mask
    Lee, Yun-Kyung
    Kwon, Oh-Wook
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2010, 29 (02): : 141 - 147
  • [43] Complex Ratio Masking for Monaural Speech Separation
    Williamson, Donald S.
    Wang, Yuxuan
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (03) : 483 - 492
  • [44] MONAURAL SPEECH SEPARATION SYSTEM BASED ON OPTIMUM SOFT MASK
    Harishkumar, N.
    Rajavel, R.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 576 - 579
  • [45] A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS
    Papadopoulos, Pavlos
    Tsiartas, Andreas
    Gibson, James
    Narayanan, Shrikanth
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] Supervised Separation of Speech from Background Piano Music using a Nonnegative Matrix Factorization Approach
    Martinez-Colon, A.
    Canadas-Quesada, F. J.
    Vera-Candeas, P.
    Ruiz-Reyes, N.
    Moreno-Fuentes, F.
    STAIRS 2014, 2014, 264 : 181 - 190
  • [47] Heterogeneous separation consistency training for adaptation of unsupervised speech separation
    Jiangyu Han
    Yanhua Long
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [48] Heterogeneous separation consistency training for adaptation of unsupervised speech separation
    Han, Jiangyu
    Long, Yanhua
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [49] Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
    Popuri, Sravya
    Chen, Peng-Jen
    Wang, Changhan
    Pino, Juan
    Adi, Yossi
    Gu, Jiatao
    Hsu, Wei-Ning
    Lee, Ann
    INTERSPEECH 2022, 2022, : 5195 - 5199
  • [50] GATED RESIDUAL NETWORKS WITH DILATED CONVOLUTIONS FOR SUPERVISED SPEECH SEPARATION
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 21 - 25