Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

被引:8
|
作者
Yin, Lu [1 ,2 ]
Wang, Ziteng [1 ,2 ]
Xia, Risheng [1 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
关键词
multi-channel speech separation; beamforming; permutation invariant training; mask estimation;
D O I
10.21437/Interspeech.2018-1739
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two speaker and three-speaker mixtures.
引用
收藏
页码:851 / 855
页数:5
相关论文
共 50 条
  • [31] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [32] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
  • [33] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
    Chen, Lianwu
    Yu, Meng
    Qian, Yanmin
    Su, Dan
    Yu, Dong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
  • [34] Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
    von Neumann, Thilo
    Boeddeker, Christoph
    Drude, Lukas
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    Haeb-Umbach, Reinhold
    [J]. INTERSPEECH 2020, 2020, : 3097 - 3101
  • [35] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
    O'Sullivan, James
    Herrero, Jose
    Smith, Elliot
    Schevon, Catherine
    McKhann, Guy M.
    Sheth, Sameer A.
    Mehta, Ashesh D.
    Mesgarani, Nima
    [J]. NEURON, 2019, 104 (06) : 1195 - +
  • [36] Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model
    Kocour, Martin
    Zmolikova, Katerina
    Ondel, Lucas
    Svec, Jan
    Delcroix, Marc
    Ochiai, Tsubasa
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. INTERSPEECH 2022, 2022, : 4955 - 4959
  • [37] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
    Frissen, Ilja
    Scherzer, Johannes
    Yao, Hsin-Yun
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
  • [38] Spatial Separation Benefit for Speech Detection in Multi-Talker Babble-Noise with Different Egocentric Distances
    Andreeva, I. G.
    Dymnikowa, M.
    Gvozdeva, A. P.
    Ogorodnikova, E. A.
    Pak, S. P.
    [J]. ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (03) : 484 - 491
  • [39] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
    Westermann, Adam
    Buchholz, Joerg M.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
  • [40] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
    Tran, Van-Thuan
    Tsai, Wei-Ho
    [J]. IEEE ACCESS, 2020, 8 : 134868 - 134879