Neural Spatio-Temporal Beamformer for Target Speech Separation

被引:20
|
作者
Xu, Yong [1 ]
Yu, Meng [1 ]
Zhang, Shi-Xiong [1 ]
Chen, Lianwu [2 ]
Weng, Chao [1 ]
Liu, Jianming [1 ]
Yu, Dong [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Tencent AI Lab, Shenzhen, Peoples R China
来源
关键词
target speech separation; multi-tap MVDR; mask-based MVDR; spatio-temporal beamformer; NOISE-REDUCTION; ENHANCEMENT; RECOGNITION; END;
D O I
10.21437/Interspeech.2020-1458
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR). On the other hand, the minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, although can significantly reduce speech distortions, has limited noise reduction capability. In this paper, we propose a multi-tap MVDR beamformer with complex-valued masks for speech separation and enhancement. Compared to the state-of-the-art NN-mask based MVDR beamformer, the multi-tap MVDR beamformer exploits the inter-frame correlation in addition to the inter-microphone correlation that is already utilized in prior arts. Further improvements include the replacement of the real-valued masks with the complex-valued masks and the joint training of the complex-mask NN. The evaluation on our multi-modal multi-channel target speech separation and enhancement platform demonstrates that our proposed multi-tap MVDR beamformer improves both the ASR accuracy and the perceptual speech quality against prior arts.
引用
下载
收藏
页码:56 / 60
页数:5
相关论文
共 50 条
  • [1] Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation
    Xu, Yong
    Zhang, Zhuohuang
    Yu, Meng
    Zhang, Shi-Xiong
    Yu, Dong
    INTERSPEECH 2021, 2021, : 3076 - 3080
  • [2] A spatio-temporal neural network applied to visual speech recognition
    Baig, AR
    Séguier, R
    Vaucher, G
    NINTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS (ICANN99), VOLS 1 AND 2, 1999, (470): : 797 - 802
  • [3] Reconstructing spatio-temporal activities of neural sources using an MEG vector beamformer technique
    Sekihara, K
    Nagarajan, SS
    Poeppel, D
    Marantz, A
    Miyashita, Y
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2001, 48 (07) : 760 - 771
  • [4] Reconstructing spatio-temporal activities of neural sources from magnetoencephalographic data using a vector beamformer
    Sekihara, K
    Nagarajan, S
    Poeppel, D
    Miyashita, Y
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 2021 - 2024
  • [5] Blind separation of spatio-temporal Synfire sources and visualization of neural cliques
    Unger, Hilit
    Zeevi, Yehoshua Y.
    NEUROCOMPUTING, 2006, 69 (13-15) : 1475 - 1484
  • [6] ALL-NEURAL BEAMFORMER FOR CONTINUOUS SPEECH SEPARATION
    Zhang, Zhuohuang
    Yoshioka, Takuya
    Kanda, Naoyuki
    Chen, Zhuo
    Wang, Xiaofei
    Wang, Dongmei
    Eskimez, Sefik Emre
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6032 - 6036
  • [7] Spatio-Temporal Functional Neural Networks
    Rao, Aniruddha Rajendra
    Wang, Qiyao
    Wang, Haiyan
    Khorasgani, Hamed
    Gupta, Chetan
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 81 - 89
  • [8] Spatio-Temporal RBF Neural Networks
    Khan, Shujaat
    Ahmad, Jawwad
    Sadiq, Alishba
    Naseem, Imran
    Moinuddin, Muhammad
    2018 3RD INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, SCIENCES AND TECHNOLOGY (ICEEST), 2018,
  • [9] Spatio-temporal processing for distant speech recognition
    Low, SY
    Togneri, R
    Nordholm, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1001 - 1004
  • [10] Spatio-temporal dynamics of turbulent separation bubbles
    Wu, Wen
    Meneveau, Charles
    Mittal, Rajat
    JOURNAL OF FLUID MECHANICS, 2020, 883