Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

被引:8
|
作者
Yin, Lu [1 ,2 ]
Wang, Ziteng [1 ,2 ]
Xia, Risheng [1 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
关键词
multi-channel speech separation; beamforming; permutation invariant training; mask estimation;
D O I
10.21437/Interspeech.2018-1739
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two speaker and three-speaker mixtures.
引用
收藏
页码:851 / 855
页数:5
相关论文
共 50 条
  • [41] Selective cortical representation of attended speaker in multi-talker speech perception
    Nima Mesgarani
    Edward F. Chang
    [J]. Nature, 2012, 485 : 233 - 236
  • [42] Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 176 - 181
  • [43] Effects of face masks on speech recognition in multi-talker babble noise
    Toscano, Joseph C.
    Toscano, Cheyenne M.
    [J]. PLOS ONE, 2021, 16 (02):
  • [44] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    [J]. INTERSPEECH 2020, 2020, : 304 - 308
  • [45] USING BINARUAL PROCESSING FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER SCENES
    Spille, Constantin
    Dietz, Mathias
    Hohmann, Volker
    Meyer, Bernd T.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7805 - 7809
  • [46] Selective cortical representation of attended speaker in multi-talker speech perception
    Mesgarani, Nima
    Chang, Edward F.
    [J]. NATURE, 2012, 485 (7397) : 233 - U118
  • [47] Audio-Visual Multi-Talker Speech Recognition in A Cocktail Party
    Wu, Yifei
    Hi, Chenda
    Yang, Song
    Wu, Zhongqin
    Qian, Yanmin
    [J]. INTERSPEECH 2021, 2021, : 3021 - 3025
  • [48] Auditory spatial cuing for speech perception in a dynamic multi-talker environment
    Tomoriova, Beata
    Kopco, Norbert
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS, 2008, : 230 - 233
  • [49] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    PENG JianXin 1
    2 Department of Architecture
    [J]. Science Bulletin, 2012, 57 (20) : 2561 - 2566
  • [50] EFFECTS OF MULTI-TALKER COMPETING SPEECH ON THE VARIABILITY OF THE CALIFORNIA CONSONANT TEST
    SURR, RK
    SCHWARTZ, DM
    [J]. EAR AND HEARING, 1980, 1 (06): : 319 - 323