DCCRN plus : Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

被引:32
|
作者
Lv, Shubo [1 ]
Hu, Yanxin [1 ]
Zhang, Shimin [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
来源
关键词
speech enhancement; sub-band processing; deep complex convolution recurrent network; NEURAL-NETWORK;
D O I
10.21437/Interspeech.2021-1482
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.
引用
收藏
页码:2816 / 2820
页数:5
相关论文
共 50 条
  • [1] S-DCCRN: SUPER WIDE BAND DCCRN WITH LEARNABLE COMPLEX FEATURE FOR SPEECH ENHANCEMENT
    Lv, Shubo
    Fu, Yihui
    Xing, Mengtao
    Sun, Jiayao
    Xie, Lei
    Huang, Jun
    Wang, Yannan
    Yu, Tao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7767 - 7771
  • [2] SPATIAL-DCCRN: DCCRN EQUIPPED WITH FRAME-LEVEL ANGLE FEATURE AND HYBRID FILTERING FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Lv, Shubo
    Fu, Yihui
    Jv, Yukai
    Xie, Lei
    Zhu, Weixin
    Rao, Wei
    Wang, Yannan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 436 - 443
  • [3] Distil-DCCRN: A Small-Footprint DCCRN Leveraging Feature-Based Knowledge Distillation in Speech Enhancement
    Han, Runduo
    Xu, Weiming
    Zhang, Zihan
    Liu, Mingshuai
    Xie, Lei
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2075 - 2079
  • [4] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    INTERSPEECH 2020, 2020, : 2472 - 2476
  • [5] Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement
    Bartolewska, Julitta
    Kacprzak, Stanislaw
    Kowalczyk, Konrad
    INTERSPEECH 2023, 2023, : 4039 - 4043
  • [6] Underwater image enhancement via a channel-wise transmission estimation network
    Wang, Qiang
    Fu, Bo
    Fan, Huijie
    IET IMAGE PROCESSING, 2023, 17 (10) : 2958 - 2971
  • [7] A novel skip connection mechanism based on channel-wise cross transformer for speech enhancement
    Jiang, Weiqi
    Sun, Chengli
    Chen, Feilong
    Leng, Yan
    Guo, Qiaosheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 34849 - 34866
  • [8] A novel skip connection mechanism based on channel-wise cross transformer for speech enhancement
    Weiqi Jiang
    Chengli Sun
    Feilong Chen
    Yan Leng
    Qiaosheng Guo
    Multimedia Tools and Applications, 2024, 83 : 34849 - 34866
  • [9] Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music
    Liu, Haohe
    Xie, Lei
    Wu, Jian
    Yang, Geng
    INTERSPEECH 2020, 2020, : 1241 - 1245
  • [10] A priori SNR estimation and noise estimation for speech enhancement
    Yao, Rui
    Zeng, ZeQing
    Zhu, Ping
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,