Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

被引:0
|
作者
Xia, Shasha [1 ]
Li, Hao [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
关键词
NOISE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [21] Supervised speech separation combined with adaptive beamforming
    Saric, Zoran
    Subotic, Misko
    Bilibajkic, Ruzica
    Barjaktarovic, Marko
    Stojanovic, Jasmina
    COMPUTER SPEECH AND LANGUAGE, 2022, 76
  • [22] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Shanfa Ke
    Ruimin Hu
    Xiaochen Wang
    Tingzhao Wu
    Gang Li
    Zhongyuan Wang
    Multimedia Tools and Applications, 2020, 79 : 32225 - 32241
  • [23] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Ke, Shanfa
    Hu, Ruimin
    Wang, Xiaochen
    Wu, Tingzhao
    Li, Gang
    Wang, Zhongyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32225 - 32241
  • [24] Noise Perturbation Improves Supervised Speech Separation
    Chen, Jitong
    Wang, Yuxuan
    Wang, DeLiang
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015, 2015, 9237 : 83 - 90
  • [25] A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target
    Wang, Lei
    Zhu, Jie
    Sun, Kangbo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (11) : 1963 - 1970
  • [26] Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits
    Lin, Qingjian
    Yang, Lin
    Wang, Xuyang
    Xie, Luyuan
    Jia, Chen
    Wang, Junjie
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 689 - 693
  • [27] Impact of Mask Type as Training Target for Speech Intelligibility and Quality in Cochlear-Implant Noise Reduction
    Henry, Fergal
    Glavin, Martin
    Jones, Edward
    Parsi, Ashkan
    Sensors, 2024, 24 (20)
  • [28] IDEAL RATIO MASK ESTIMATION USING DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Narayanan, Arun
    Wang, DeLiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7092 - 7096
  • [29] Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech
    Lin, Jingru
    Ge, Meng
    Wang, Wupeng
    Li, Haizhou
    Feng, Mengling
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1014 - 1018
  • [30] Blind separation of speech target sources using ICA in the frequency domain
    Gholamrezaii M.
    Aghabozorgi M.R.
    Abutalebi H.R.
    2010 5th International Symposium on Telecommunications, IST 2010, 2010, : 765 - 768