SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

被引:4
|
作者
Lv, Shubo [1 ,2 ]
Fu, Yihui [1 ]
Jv, Yukai [1 ,2 ]
Xie, Lei [1 ]
Zhu, Weixin [2 ]
Rao, Wei [2 ]
Wang, Yannan [2 ]
机构
[1] Northwestern Polytech Univ, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Tencent Corp, Tencent Ethereal Audio Lab, Shenzhen, Peoples R China
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
multi-channel; Spatial-DCCRN; speech enhancement;
D O I
10.1109/SLT54892.2023.10022488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network - Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and mapping filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.
引用
收藏
页码:436 / 443
页数:8
相关论文
共 15 条
  • [1] A Feature Integration Network for Multi-Channel Speech Enhancement
    Zeng, Xiao
    Zhang, Xue
    Wang, Mingjiang
    SENSORS, 2024, 24 (22)
  • [2] Multi-channel speech enhancement in a car environment using wiener filtering and spectral subtraction
    Meyer, J
    Simmer, KU
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1167 - 1170
  • [3] Three-stage hybrid neural beamformer for multi-channel speech enhancement
    Kuang, Kelan
    Yang, Feiran
    Li, Junfeng
    Yang, Jun
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (06): : 3378 - 3389
  • [4] Two-Stage Single-Channel Speech Enhancement with Multi-Frame Filtering
    Lin, Shaoxiong
    Zhang, Wangyou
    Qian, Yanmin
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [5] Probabilistic spatial filter estimation for signal enhancement in multi-channel automatic speech recognition
    Kayseri, Hendrik
    Moritz, Niko
    Anemueller, Joern
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2562 - 2566
  • [6] Noise eigenvalue modification methods for spatial subspace based multi-channel speech enhancement
    Kim, Gibak
    Cho, Nam Ik
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 573 - +
  • [7] ENHANCING END-TO-END MULTI-CHANNEL SPEECH SEPARATION VIA SPATIAL FEATURE LEARNING
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Zou, Yuexian
    Yu, Dong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7319 - 7323
  • [8] AUTOMATIC CHANNEL SELECTION AND SPATIAL FEATURE INTEGRATION FOR MULTI-CHANNEL SPEECH RECOGNITION ACROSS VARIOUS ARRAY TOPOLOGIES
    Mu, Bingshen
    Guo, Pengcheng
    Guo, Dake
    Zhou, Pan
    Chen, Wei
    Xie, Lei
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 11396 - 11400
  • [9] Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment
    Wang Xiaofei
    Guo Yanmeng
    Fu Qiang
    Yan Yonghong
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (03) : 512 - 519
  • [10] Speech Enhancement Using Multi-channel Post-Filtering with Modified Signal Presence Probability in Reverberant Environment
    WANG Xiaofei
    GUO Yanmeng
    FU Qiang
    YAN Yonghong
    Chinese Journal of Electronics, 2016, 25 (03) : 512 - 519