A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge

被引:12
|
作者
Sun, Lei [1 ]
Du, Jun [1 ]
Gao, Tian [2 ]
Fang, Yi [2 ]
Ma, Feng [2 ]
Lee, Chin-Hui [3 ]
机构
[1] Univ Sci & Technol China, Hefei 230052, Anhui, Peoples R China
[2] iFlytek, Hefei 230088, Anhui, Peoples R China
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
The CHiME-5 challenge; speech enhancement; speech separation; mask estimation; robust speech recognition; BLIND SOURCE SEPARATION; INDEPENDENT COMPONENT ANALYSIS; ENHANCEMENT; RECOGNITION;
D O I
10.1109/JSTSP.2019.2920764
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a novel speaker-dependent speech separation framework for the challenging CHiME-5 acoustic environments, exploiting advantages of both deep learning based and conventional preprocessing techniques to prepare data effectively for separating target speech from multi-talker mixed speech collected with multiple microphone arrays. First, a series of multi-channel operations is conducted to reduce existing reverberation and noise, and a single-channel deep learning based speech enhancement model is used to predict speech presence probabilities. Next, a two-stage supervised speech separation approach, using oracle speaker diarization information from CHiME-5, is proposed to separate speech of a target speaker from interference speakers in mixed speech. Given a set of three estimated masks of the background noise, the target speaker and the interference speakers from single-channel speech enhancement and separation models, a complex Gaussian mixture model based generalized eigenvalue beam-former is then used for enhancing the signal at the reference array while avoiding the speaker permutation issue. Furthermore, the proposed front-end can generate a large variety of processed data for an ensemble of speech recognition results. Experiments on the development set have shown that the proposed two-stage approach can yield significant improvements of recognition performance over the official baseline system and achieved top accuracies in all four competing evaluation categories among all systems submitted to the CHiME-5 Challenge.
引用
收藏
页码:827 / 840
页数:14
相关论文
共 8 条
  • [1] Improved Speaker-Dependent Separation for CHiME-5 Challenge
    Wu, Jian
    Xu, Yong
    Zhang, Shi-Xiong
    Chen, Lianwu
    Yu, Meng
    Xie, Lei
    Yu, Dong
    [J]. INTERSPEECH 2019, 2019, : 466 - 470
  • [2] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [3] A TWO-STAGE SINGLE-CHANNEL SPEAKER-DEPENDENT SPEECH SEPARATION APPROACH FOR CHIME-5 CHALLENGE
    Sun, Lei
    Du, Jun
    Gao, Tian
    Fang, Yi
    Ma, Feng
    Pan, Jia
    Lee, Chin-Hui
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6650 - 6654
  • [4] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech
    Yan-Hui Tu
    Jun Du
    Chin-Hui Lee
    [J]. Journal of Signal Processing Systems, 2018, 90 : 963 - 973
  • [6] A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 963 - 973
  • [7] Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification
    Yang, Joon-Young
    Chang, Joon-Hyuk
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2022, 30 : 3144 - 3159
  • [8] Task-Specific Optimization of Virtual Channel Linear Prediction-Based Speech Dereverberation Front-End for Far-Field Speaker Verification
    Yang, Joon-Young
    Chang, Joon-Hyuk
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 3144 - 3159