Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

被引:0
|
作者
Xia, Shasha [1 ]
Li, Hao [1 ]
Zhang, Xueliang [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
关键词
NOISE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [1] Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1151 - 1155
  • [2] Ideal ratio mask estimation using supervised DNN approach for target speech signal enhancement
    Selvaraj, Poovarasan
    Chandra, E.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 1869 - 1883
  • [3] A STRUCTURE-PRESERVING TRAINING TARGET FOR SUPERVISED SPEECH SEPARATION
    Wang, Yuxuan
    Wang, DeLiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] On Training Targets for Supervised Speech Separation
    Wang, Yuxuan
    Narayanan, Arun
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1849 - 1858
  • [5] The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
    Liang, Shan
    Liu, Wenju
    Jiang, Wei
    Xue, Wei
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (05): : EL452 - EL458
  • [6] A Comparative Study of IBM and IRM Target Mask for Supervised Malay Speech Separation from Noisy Background
    Jamal, Norezmi
    Fuad, N.
    Sha'abani, Mnah
    Shanta, Shahnoor
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 153 - 160
  • [7] Supervised single-channel speech enhancement using ratio mask with joint dictionary learning
    Zhang, Long
    Bao, Guangzhao
    Zhang, Jing
    Ye, Zhongfu
    SPEECH COMMUNICATION, 2016, 82 : 38 - 52
  • [8] Beamforming-based Speech Enhancement based on Optimal Ratio Mask
    Ji, Qiang
    Bao, Changchun
    Cheng, Rui
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [9] Constrained Ratio Mask for Speech Enhancement Using DNN
    Yu, Hongjiang
    Zhu, Wei-Ping
    Yang, Yuhong
    INTERSPEECH 2020, 2020, : 2427 - 2431
  • [10] Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
    Masuyama, Yoshiki
    Togami, Masahito
    Komatsu, Tatsuya
    INTERSPEECH 2019, 2019, : 2708 - 2712