Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

被引：8

作者：

Yin, Lu ^{[1
,2
]}

Wang, Ziteng ^{[1
,2
]}

Xia, Risheng ^{[1
]}

Li, Junfeng ^{[1
,2
]}

Yan, Yonghong ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

multi-channel speech separation; beamforming; permutation invariant training; mask estimation;

D O I：

10.21437/Interspeech.2018-1739

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two speaker and three-speaker mixtures.

引用

页码：851 / 855

页数：5

共 50 条

[31] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
Tripathi, Anshuman
Lu, Han
Sak, Hasim
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
[32] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
[33] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Chen, Lianwu
Yu, Meng
Qian, Yanmin
Su, Dan
Yu, Dong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
[34] Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
von Neumann, Thilo
Boeddeker, Christoph
Drude, Lukas
Kinoshita, Keisuke
Delcroix, Marc
Nakatani, Tomohiro
Haeb-Umbach, Reinhold
[J]. INTERSPEECH 2020, 2020, : 3097 - 3101
[35] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
O'Sullivan, James
Herrero, Jose
Smith, Elliot
Schevon, Catherine
McKhann, Guy M.
Sheth, Sameer A.
Mehta, Ashesh D.
Mesgarani, Nima
[J]. NEURON, 2019, 104 (06) : 1195 - +
[36] Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model
Kocour, Martin
Zmolikova, Katerina
Ondel, Lucas
Svec, Jan
Delcroix, Marc
Ochiai, Tsubasa
Burget, Lukas
Cernocky, Jan Honza
[J]. INTERSPEECH 2022, 2022, : 4955 - 4959
[37] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
Frissen, Ilja
Scherzer, Johannes
Yao, Hsin-Yun
[J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
[38] Spatial Separation Benefit for Speech Detection in Multi-Talker Babble-Noise with Different Egocentric Distances
Andreeva, I. G.
Dymnikowa, M.
Gvozdeva, A. P.
Ogorodnikova, E. A.
Pak, S. P.
[J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (03) : 484 - 491
[39] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
Westermann, Adam
Buchholz, Joerg M.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
[40] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
Tran, Van-Thuan
Tsai, Wei-Ho
[J]. IEEE ACCESS, 2020, 8 : 134868 - 134879

← 1 2 3 4 5 →