Mask-based blind source separation and MVDR beamforming in ASR

被引:3
|
作者
He, Renke [1 ]
Long, Yanhua [1 ]
Li, Yijie [2 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;
D O I
10.1007/s10772-019-09666-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [1] Mask-based blind source separation and MVDR beamforming in ASR
    Renke He
    Yanhua Long
    Yijie Li
    Jiaen Liang
    International Journal of Speech Technology, 2020, 23 : 133 - 140
  • [2] Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
    Masuyama, Yoshiki
    Togami, Masahito
    Komatsu, Tatsuya
    INTERSPEECH 2019, 2019, : 2708 - 2712
  • [3] DNN-SUPPORTED MASK-BASED CONVOLUTIONAL BEAMFORMING FOR SIMULTANEOUS DENOISING, DEREVERBERATION, AND SOURCE SEPARATION
    Nakatani, Tomohiro
    Takahashi, Riki
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Ikeshita, Rintaro
    Delcroix, Marc
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6399 - 6403
  • [4] FRAME-BY-FRAME CLOSED-FORM UPDATE FOR MASK-BASED ADAPTIVE MVDR BEAMFORMING
    Higuchi, Takuya
    Kinoshita, Keisuke
    Ito, Nobutaka
    Karita, Shigeki
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 531 - 535
  • [5] Improvement of Mask-Based Speech Source Separation Using DNN
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [6] Unsupervised training of neural mask-based beamforming
    Drude, Lukas
    Heymann, Jahn
    Haeb-Umbach, Reinhold
    INTERSPEECH 2019, 2019, : 1253 - 1257
  • [7] Wideband Blind Source Separation Algorithm Based on Beamforming
    Weihong Fu
    Yichen Zhang
    Wireless Personal Communications, 2019, 108 : 221 - 234
  • [8] Wideband Blind Source Separation Algorithm Based on Beamforming
    Fu, Weihong
    Zhang, Yichen
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 108 (01) : 221 - 234
  • [9] Research on blind source separation and blind beamforming
    Zhao, B
    Yang, JA
    Zhang, M
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 4389 - 4393
  • [10] Directional interference supression based on blind source separation with beamforming
    Kang, Chun-Yu
    Zidonghua Xuebao/Acta Automatica Sinica, 2014, 40 (05): : 983 - 987