Mask-based blind source separation and MVDR beamforming in ASR

被引:3
|
作者
He, Renke [1 ]
Long, Yanhua [1 ]
Li, Yijie [2 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;
D O I
10.1007/s10772-019-09666-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [41] Mask-based Beamforming Using Complex-valued Neural Network for Recognition of Spatial Target Speech
    Hayakawa, Daichi
    Kagoshima, Takehiko
    Fujimura, Hiroshi
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 23 - 29
  • [42] Analysis of mask-based nanowire decoders
    Rachlin, Eric
    Savage, John E.
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (02) : 175 - 187
  • [43] Analysis of a mask-based nanowire decoder
    Rachlin, E
    Savage, JE
    Gojman, B
    IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: NEW FRONTIERS IN VLSI DESIGN, 2005, : 6 - 13
  • [44] Improved MVDR beamforming using single-channel mask prediction networks
    Erdogan, Hakan
    Hershey, John
    Watanabe, Shinji
    Mandel, Michael
    Le Roux, Jonathan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1981 - 1985
  • [45] Beamforming-based convolutive source separation
    Baumann, W
    Kolossa, D
    Orglmeister, R
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 357 - 360
  • [46] Teacher-Student Learning and Post-processing for Robust BiLSTM Mask-Based Acoustic Beamforming
    Liu, Zhaoyi
    Chen, Qiuyuan
    Hu, Han
    Tang, Haoyu
    Zou, Y. X.
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 522 - 533
  • [47] A Mask-Based Adversarial Defense Scheme
    Xu, Weizhen
    Zhang, Chenyi
    Zhao, Fangzhen
    Fang, Liangda
    ALGORITHMS, 2022, 15 (12)
  • [48] MULTIMODAL BLIND SOURCE SEPARATION WITH A CIRCULAR MICROPHONE ARRAY AND ROBUST BEAMFORMING
    Naqvi, Syed Mohsen
    Khan, Muhammad Salman
    Liu, Qingju
    Wang, Wenwu
    Chambers, Jonathon A.
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1050 - 1054
  • [49] Blind source separation for robot audition using fixed HRTF beamforming
    Maazaoui, Mounira
    Abed-Meraim, Karim
    Grenier, Yves
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [50] Blind source separation for robot audition using fixed HRTF beamforming
    Mounira Maazaoui
    Karim Abed-Meraim
    Yves Grenier
    EURASIP Journal on Advances in Signal Processing, 2012