Mask-based blind source separation and MVDR beamforming in ASR

被引:3
|
作者
He, Renke [1 ]
Long, Yanhua [1 ]
Li, Yijie [2 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;
D O I
10.1007/s10772-019-09666-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [21] ONLINE MEETING RECOGNITION IN NOISY ENVIRONMENTS WITH TIME-FREQUENCY MASK BASED MVDR BEAMFORMING
    Araki, Shoko
    Ito, Nobutaka
    Delcroix, Marc
    Ogawa, Atsunori
    Kinoshita, Keisuke
    Higuchi, Takuya
    Yoshioka, Takuya
    Dung Tran
    Karita, Shigeki
    Nakatani, Tomohiro
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 16 - 20
  • [22] Fast convergence blind source separation based on frequency subband interpolation by null beamforming
    Osako, Keiichi
    Mori, Yoshimitsu
    Takahashi, Yu
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 105 - 108
  • [23] Effect of Window Length in Combining Blind Source Separation and Beamforming
    Johnson, R. Ruben
    2017 IEEE 3RD INTERNATIONAL CONFERENCE ON SENSING, SIGNAL PROCESSING AND SECURITY (ICSSS), 2017, : 447 - 449
  • [24] Blind source separation based on a fast-convergence algorithm combining ICA and beamforming
    Saruwatari, H
    Kawamura, T
    Nishikawa, T
    Lee, A
    Shikano, K
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 666 - 678
  • [25] Blind Source Separation Combining Independent Component Analysis and Beamforming
    Hiroshi Saruwatari
    Satoshi Kurita
    Kazuya Takeda
    Fumitada Itakura
    Tsuyoki Nishikawa
    Kiyohiro Shikano
    EURASIP Journal on Advances in Signal Processing, 2003
  • [26] Blind source separation combining independent component analysis and beamforming
    Saruwatari, H
    Kurita, S
    Takeda, K
    Itakura, F
    Nishikawa, T
    Shikano, K
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (11) : 1135 - 1146
  • [27] Blind source separation combining simo-model-based ICA and adaptive beamforming
    Ukai, S
    Takatani, T
    Nishikawa, T
    Saruwatari, H
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 85 - 88
  • [28] Blind Source Separation Combining Independent Component Analysis and Beamforming
    Saruwatari, Hiroshi
    Kurita, Satoshi
    Takeda, Kazuya
    Itakura, Fumitada
    Nishikawa, Tsuyoki
    Shikano, Kiyohiro
    Eurasip Journal on Applied Signal Processing, 2003, 2003 (11): : 1135 - 1146
  • [29] Detection in present of reverberation Combined with Blind Source Separation and Beamforming
    Xu, Ce
    Zhang, Xinhua
    Xu, Zhaoyan
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 4, 2010, : 158 - 162
  • [30] Blind Adaptive Principal Eigenvector Beamforming for Acoustical Source Separation
    Warsitz, Ernst
    Haeb-Umbach, Reinhold
    Vu, Dang Hai Tran
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 461 - 464