Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引:133
|
作者
Tan, Ke [1 ]
Chen, Jitong [1 ,2 ]
Wang, DeLiang [1 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;
D O I
10.1109/TASLP.2018.2876171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 50 条
  • [41] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 309 - 313
  • [42] Monaural Speech Dereverberation Using Deformable Convolutional Networks
    Kothapally, Vinay
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1712 - 1723
  • [43] REDUNDANT CONVOLUTIONAL NETWORK WITH ATTENTION MECHANISM FOR MONAURAL SPEECH ENHANCEMENT
    Lan, Tian
    Lyu, Yilan
    Hui, Guoqiang
    Mokhosi, Refuoe
    Li, Sen
    Liu, Qiao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6654 - 6658
  • [44] PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT
    Du, Zhihao
    Lei, Ming
    Han, Jiqing
    Zhang, Shiliang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6634 - 6638
  • [45] MambaGAN: Mamba based Metric GAN for Monaural Speech Enhancement
    Luo, Tianhao
    Zhou, Feng
    Bai, Zhongxin
    [J]. 2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 411 - 416
  • [46] A Time-domain Monaural Speech Enhancement with Feedback Learning
    Li, Andong
    Zheng, Chengshi
    Cheng, Linjuan
    Peng, Renhua
    Li, Xiaodong
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 769 - 774
  • [47] Multi-stage attention network for monaural speech enhancement
    Wang, Kunpeng
    Lu, Wenjing
    Liu, Peng
    Yao, Juan
    Li, Huafeng
    [J]. IET SIGNAL PROCESSING, 2023, 17 (03)
  • [48] Phoneme-dependent NMF for speech enhancement in monaural mixtures
    Raj, Bhiksha
    Singh, Rita
    Virtanen, Tuomas
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1224 - +
  • [49] Online Monaural Speech Enhancement using Delayed Subband LSTM
    Li, Xiaofei
    Horaud, Radu
    [J]. INTERSPEECH 2020, 2020, : 2462 - 2466
  • [50] Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
    Qiu, Jiahui
    Wang, Qi
    Zhou, Yangming
    Ruan, Tong
    Gao, Ju
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 935 - 942