Probabilistic Permutation Invariant Training for Speech Separation

被引：12

作者：

Yousefi, Midia ^{[1
]}

Khorram, Soheil ^{[1
]}

Hansen, John H. L. ^{[1
]}

机构：

[1] Univ Texas Dallas, Ctr Robust Speech Syst CRSS, Richardson, TX 75083 USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

probabilistic permutation invariant training; PIT; permutation ambiguity; source separation; speech separation; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2019-1827

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log-likelihood function. Prob-PIT can be easily implemented by replacing the minimum function of PIT with a soft-minimum function. We evaluate our approach for speech separation on both TIMIT and CHiME datasets. The results show that the proposed method significantly outperforms PIT in terms of Signal to Distortion Ratio and Signal to Interference Ratio.

引用

页码：4604 / 4608

页数：5

共 50 条

[1] ON PERMUTATION INVARIANT TRAINING FOR SPEECH SOURCE SEPARATION
Liu, Xiaoyu
Pons, Jordi
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6 - 10
[2] INTERRUPTED AND CASCADED PERMUTATION INVARIANT TRAINING FOR SPEECH SEPARATION
Yang, Gene-Ping
Wu, Szu-Lin
Mao, Yao-Wen
Lee, Hung-yi
Lee, Lin-shah
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6369 - 6373
[3] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Chen, Lianwu
Yu, Meng
Qian, Yanmin
Su, Dan
Yu, Dong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
[4] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Yin, Lu
Wang, Ziteng
Xia, Risheng
Li, Junfeng
Yan, Yonghong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
[5] MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training
Karamatli, Ertug
Kirbiz, Serap
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2637 - 2641
[6] Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
Fan, Cunhang
Liu, Bin
Tao, Jianhua
Wen, Zhengqi
Yi, Jiangyan
Bai, Ye
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 26 - 30
[7] Single-channel speech separation using soft-minimum permutation invariant training
Yousefi, Midia
Hansen, John H. L.
[J]. SPEECH COMMUNICATION, 2023, 151 : 76 - 85
[8] SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM
Xu, Chenglin
Rao, Wei
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6 - 10
[9] Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
Kolbaek, Morten
Yu, Dong
Tan, Zheng-Hua
Jensen, Jesper
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 1901 - 1913
[10] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
Yul, Dang
Kalbcek, Marten
Tan, Zheng-Hua
Jensen, Jesper
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245

← 1 2 3 4 5 →