Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming

被引:8
|
作者
Yin, Lu [1 ,2 ]
Wang, Ziteng [1 ,2 ]
Xia, Risheng [1 ]
Li, Junfeng [1 ,2 ]
Yan, Yonghong [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Xinjiang Lab Minor Speech & Language Informat Pro, Beijing, Peoples R China
关键词
multi-channel speech separation; beamforming; permutation invariant training; mask estimation;
D O I
10.21437/Interspeech.2018-1739
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recently proposed Permutation Invariant Training (PIT) technique addresses the label permutation problem for multi talker speech separation. It has shown to be effective for the single-channel separation case. In this paper, we propose to extend the PIT-based technique to the multichannel multi-talker speech separation scenario. PIT is used to train a neural network that outputs masks for each separate speaker which is followed by a Minimum Variance Distortionless Response (MVDR) beamformer. The beamformer utilizes the spatial information of different speakers and alleviates the performance degradation due to misaligned labels. Experimental results show that the proposed PIT-MVDR-based technique leads to higher Signal-to-Distortion Ratios (SDRs) compared to the single-channel speech separation method when tested on two speaker and three-speaker mixtures.
引用
收藏
页码:851 / 855
页数:5
相关论文
共 50 条
  • [1] Recognizing Multi-talker Speech with Permutation Invariant Training
    Yu, Dong
    Chang, Xuankai
    Qian, Yanmin
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2456 - 2460
  • [2] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
  • [3] Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    Takahashi, Kohei
    Shiraishi, Toshihiko
    [J]. MECHANICAL ENGINEERING JOURNAL, 2023,
  • [4] Single-channel multi-talker speech recognition with permutation invariant training
    Qian, Yanmin
    Chang, Xuankai
    Yu, Dong
    [J]. SPEECH COMMUNICATION, 2018, 104 : 1 - 11
  • [5] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
  • [6] JOINT SEPARATION AND DENOISING OF NOISY MULTI-TALKER SPEECH USING RECURRENT NEURAL NETWORKS AND PERMUTATION INVARIANT TRAINING
    Kolbaek, Morten
    Yu, Dong
    Tan, Zheng-Hua
    Jensen, Jesper
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [7] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718
  • [8] A microphone array beamforming-based system for multi-talker speech separation
    Hidri, Adel
    Amiri, Hamid
    [J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 209 - 217
  • [9] Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation
    Huang, Lu
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yang, Yi
    Xu, Shumin
    Sun, Jiasong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1256 - 1261
  • [10] Probabilistic Permutation Invariant Training for Speech Separation
    Yousefi, Midia
    Khorram, Soheil
    Hansen, John H. L.
    [J]. INTERSPEECH 2019, 2019, : 4604 - 4608