Probabilistic Permutation Invariant Training for Speech Separation

被引:12
|
作者
Yousefi, Midia [1 ]
Khorram, Soheil [1 ]
Hansen, John H. L. [1 ]
机构
[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS, Richardson, TX 75083 USA
来源
关键词
probabilistic permutation invariant training; PIT; permutation ambiguity; source separation; speech separation; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-1827
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log-likelihood function. Prob-PIT can be easily implemented by replacing the minimum function of PIT with a soft-minimum function. We evaluate our approach for speech separation on both TIMIT and CHiME datasets. The results show that the proposed method significantly outperforms PIT in terms of Signal to Distortion Ratio and Signal to Interference Ratio.
引用
收藏
页码:4604 / 4608
页数:5
相关论文
共 50 条
  • [1] ON PERMUTATION INVARIANT TRAINING FOR SPEECH SOURCE SEPARATION
    Liu, Xiaoyu
    Pons, Jordi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6 - 10
  • [2] INTERRUPTED AND CASCADED PERMUTATION INVARIANT TRAINING FOR SPEECH SEPARATION
    Yang, Gene-Ping
    Wu, Szu-Lin
    Mao, Yao-Wen
    Lee, Hung-yi
    Lee, Lin-shah
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6369 - 6373
  • [3] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
    Chen, Lianwu
    Yu, Meng
    Qian, Yanmin
    Su, Dan
    Yu, Dong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
  • [4] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [5] MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training
    Karamatli, Ertug
    Kirbiz, Serap
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2637 - 2641
  • [6] Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
    Fan, Cunhang
    Liu, Bin
    Tao, Jianhua
    Wen, Zhengqi
    Yi, Jiangyan
    Bai, Ye
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 26 - 30
  • [7] Single-channel speech separation using soft-minimum permutation invariant training
    Yousefi, Midia
    Hansen, John H. L.
    [J]. SPEECH COMMUNICATION, 2023, 151 : 76 - 85
  • [8] SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM
    Xu, Chenglin
    Rao, Wei
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6 - 10
  • [9] Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
    Kolbaek, Morten
    Yu, Dong
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 1901 - 1913
  • [10] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245