DCCRN plus : Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

被引:32
|
作者
Lv, Shubo [1 ]
Hu, Yanxin [1 ]
Zhang, Shimin [1 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
来源
INTERSPEECH 2021 | 2021年
关键词
speech enhancement; sub-band processing; deep complex convolution recurrent network; NEURAL-NETWORK;
D O I
10.21437/Interspeech.2021-1482
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge.
引用
收藏
页码:2816 / 2820
页数:5
相关论文
共 50 条
  • [21] DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT
    Li, Jingdong
    Luo, Dawei
    Liu, Yun
    Zhu, Yuanyuan
    Li, Zhaoxia
    Cui, Guohui
    Tang, Wenqi
    Chen, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6638 - 6642
  • [22] Vehicle Tracking at Nighttime by Kernelized Experts With Channel-Wise and Temporal Reliability Estimation
    Tian, Wei
    Chen, Long
    Zou, Ke
    Lauer, Martin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (10) : 3159 - 3169
  • [23] Improved Speech Emotion Recognition Using Channel-wise Global Head Pooling (CwGHP)
    Chauhan, Krishna
    Sharma, Kamalesh Kumar
    Varma, Tarun
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (09) : 5500 - 5522
  • [24] CHANNEL-WISE AV-FUSION ATTENTION FOR MULTI-CHANNEL AUDIO-VISUAL SPEECH RECOGNITION
    Xu, Gaopeng
    Yang, Song
    Li, Wei
    Wang, Song
    Wei, Guo
    Yuan, Junfeng
    Gao, Jie
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9251 - 9255
  • [25] Noisy Speech Enhancement Using a Novel a Priori SNR Estimation
    Deng, Chao
    Liu, Xiao-rui
    Liu, Hong-min
    Wang, Zhi-heng
    ADVANCES IN COMPUTER SCIENCE, INTELLIGENT SYSTEM AND ENVIRONMENT, VOL 2, 2011, 105 : 139 - +
  • [26] Improved a priori SNR estimation for speech enhancement incorporating speech distortion component
    Ou, S. (250800719@qq.com), 1600, Universitas Ahmad Dahlan, Jalan Kapas 9, Semaki, Umbul Harjo,, Yogiakarta, 55165, Indonesia (11):
  • [27] Relaxed statistical model for speech enhancement and a priori SNR estimation
    Cohen, I
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 870 - 881
  • [28] CONVEX COMBINATION FRAMEWORK FOR A PRIORI SNR ESTIMATION IN SPEECH ENHANCEMENT
    Nahma, Lara
    Yong, Pei Chee
    Dam, Hai Huyen
    Nordholm, Sven
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4975 - 4979
  • [29] Noise Spectrum Estimation Based on SNR Discrepancy for Speech Enhancement
    Saha, Atanu
    Shimamura, Tetsuya
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (02) : 373 - 377
  • [30] Channel-Wise Interactive Learning for Remote Heart Rate Estimation From Facial Video
    Li, Qi
    Guo, Dan
    Qian, Wei
    Tian, Xilan
    Sun, Xiao
    Zhao, Haifeng
    Wang, Meng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4542 - 4555