Neural Spatio-Temporal Beamformer for Target Speech Separation

被引:20
|
作者
Xu, Yong [1 ]
Yu, Meng [1 ]
Zhang, Shi-Xiong [1 ]
Chen, Lianwu [2 ]
Weng, Chao [1 ]
Liu, Jianming [1 ]
Yu, Dong [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Tencent AI Lab, Shenzhen, Peoples R China
来源
关键词
target speech separation; multi-tap MVDR; mask-based MVDR; spatio-temporal beamformer; NOISE-REDUCTION; ENHANCEMENT; RECOGNITION; END;
D O I
10.21437/Interspeech.2020-1458
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR). On the other hand, the minimum variance distortionless response (MVDR) beamformer with NN-predicted masks, although can significantly reduce speech distortions, has limited noise reduction capability. In this paper, we propose a multi-tap MVDR beamformer with complex-valued masks for speech separation and enhancement. Compared to the state-of-the-art NN-mask based MVDR beamformer, the multi-tap MVDR beamformer exploits the inter-frame correlation in addition to the inter-microphone correlation that is already utilized in prior arts. Further improvements include the replacement of the real-valued masks with the complex-valued masks and the joint training of the complex-mask NN. The evaluation on our multi-modal multi-channel target speech separation and enhancement platform demonstrates that our proposed multi-tap MVDR beamformer improves both the ASR accuracy and the perceptual speech quality against prior arts.
引用
收藏
页码:56 / 60
页数:5
相关论文
共 50 条
  • [41] Hybrid neural network for spatio-temporal pattern recognition
    Beijing Univ of Technology, Beijing, China
    Zhongguo Shengwu Yixue Gongcheng Xuebao, 2 (1-6):
  • [42] Patterns of spatio-temporal activation during the perception and production of speech
    Humphries, C
    Buchsbaum, B
    Love, T
    Swinney, D
    Hickok, G
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, : 45 - 46
  • [43] On the spatio-temporal dynamics of a class of cellular neural networks
    Goras, L
    Teodorescu, TD
    Ghinea, R
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2003, 12 (04) : 399 - 416
  • [44] Spatio-temporal influences at the neural level of object recognition
    Wallis, G
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 1998, 9 (02) : 265 - 278
  • [45] An iterative spatio-temporal speech enhancement algorithm for microphone arrays
    Gupta, Malay
    Douglas, Scott C.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 81 - 84
  • [46] A neural network filter for complex spatio-temporal patterns
    Ma, JW
    Huang, DZ
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 1028 - 1033
  • [47] Learning spatio-temporal patterns with Neural Cellular Automata
    Richardson, Alex D.
    Antal, Tibor
    Blythe, Richard A.
    Schumacher, Linus J.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (04)
  • [48] STNN: A Spatio-Temporal Neural Network for Traffic Predictions
    He, Zhixiang
    Chow, Chi-Yin
    Zhang, Jia-Dong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (12) : 7642 - 7651
  • [49] Imitation Learning of Neural Spatio-Temporal Point Processes
    Zhu, Shixiang
    Li, Shuang
    Peng, Zhigang
    Xie, Yao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5391 - 5402
  • [50] On the inclusion of spatial information for spatio-temporal neural networks
    Rodrigo de Medrano
    José L. Aznarte
    Neural Computing and Applications, 2021, 33 : 14723 - 14740