Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引:133
|
作者
Tan, Ke [1 ]
Chen, Jitong [1 ,2 ]
Wang, DeLiang [1 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;
D O I
10.1109/TASLP.2018.2876171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 50 条
  • [1] Monaural speech enhancement with dilated convolutions
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    [J]. INTERSPEECH 2019, 2019, : 3143 - 3147
  • [2] GATED RESIDUAL NETWORKS WITH DILATED CONVOLUTIONS FOR SUPERVISED SPEECH SEPARATION
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 21 - 25
  • [3] Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 380 - 390
  • [4] FurcaNeXt: End-to-End Monaural Speech Separation with Dynamic Gated Dilated Temporal Convolutional Networks
    Zhang, Liwen
    Shi, Ziqiang
    Han, Jiqing
    Shi, Anyan
    Ma, Ding
    [J]. MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 653 - 665
  • [5] Dilated convolutional recurrent neural network for monaural speech enhancement
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
  • [6] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Sunny Dayal Vanambathina
    Vaishnavi Anumola
    Ponnapalli Tejasree
    R. Divya
    B. Manaswini
    [J]. Multimedia Tools and Applications, 2023, 82 : 45717 - 45732
  • [7] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Vanambathina, Sunny Dayal
    Anumola, Vaishnavi
    Tejasree, Ponnapalli
    Divya, R.
    Manaswini, B.
    [J]. Multimedia Tools and Applications, 2023, 82 (29): : 45717 - 45732
  • [8] Convolutional gated recurrent unit networks based real-time monaural speech enhancement
    Vanambathina, Sunny Dayal
    Anumola, Vaishnavi
    Tejasree, Ponnapalli
    Divya, R.
    Manaswini, B.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (29) : 45717 - 45732
  • [9] TWO-STAGE SPEECH ENHANCEMENT USING GATED CONVOLUTIONS
    Thieling, Lars
    Jax, Peter
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [10] Monaural Speech Enhancement Based on Attention-Gate Dilated Convolution Network
    Zhang Tianqi
    Bai Haojun
    Ye Shaopeng
    Liu Jianxing
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (09) : 3277 - 3288