Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement

被引:139
|
作者
Tan, Ke [1 ]
Chen, Jitong [1 ,2 ]
Wang, DeLiang [1 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Baidu Res, Silicon Valley AI Lab, Sunnyvale, CA 94089 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Dilated convolutions; residual learning; gated linear units; sequence-to-sequence mapping; speech enhancement;
D O I
10.1109/TASLP.2018.2876171
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For supervised speech enhancement, contextual information is important for accurate mask estimation or spectral mapping. However, commonly used deep neural networks (DNNs) are limited in capturing temporal contexts. To leverage long-term contexts for tracking a target speaker, we treat speech enhancement as a sequence-to-sequence mapping, and present a novel convolutional neural network (CNN) architecture for monaural speech enhancement. The key idea is to systematically aggregate contexts through dilated convolutions, which significantly expand receptive fields. The CNN model additionally incorporates gating mechanisms and residual learning. Our experimental results suggest that the proposed model generalizes well to untrained noises and untrained speakers. It consistently outperforms a DNN, a unidirectional long short-term memory (LSTM) model, and a bidirectional LSTM model in terms of objective speech intelligibility and quality metrics. Moreover, the proposed model has far fewer parameters than DNN and LSTM models.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 50 条
  • [21] Adversarial Dictionary Learning for Monaural Speech Enhancement
    Ji, Yunyun
    Xu, Longting
    Zhu, Wei-Ping
    INTERSPEECH 2020, 2020, : 4034 - 4038
  • [22] GAN-in-GAN for Monaural Speech Enhancement
    Duan, Yicun
    Ren, Jianfeng
    Yu, Heng
    Jiang, Xudong
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 853 - 857
  • [23] Monaural speech enhancement based on periodicity analysis
    Chen, Z.
    Hohmann, V
    BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2014, 59 : S736 - S736
  • [24] Speech Enhancement with Wide Residual Networks in Reverberant Environments
    Llombart, Jorge
    Ribas, Dayana
    Miguel, Antonio
    Vicente, Luis
    Ortega, Alfonso
    Lleida, Eduardo
    INTERSPEECH 2019, 2019, : 1811 - 1815
  • [25] Combining Multi-Perspective Attention Mechanism With Convolutional Networks for Monaural Speech Enhancement
    Lan, Tian
    Lyu, Yilan
    Ye, Wenzheng
    Hui, Guoqiang
    Xu, Zenglin
    Liu, Qiao
    IEEE ACCESS, 2020, 8 : 78979 - 78991
  • [26] Performance analysis of low complexity fully connected neural networks for monaural speech enhancement
    Reddy, Himavanth
    Kar, Asutosh
    Ostergaard, Jan
    APPLIED ACOUSTICS, 2022, 190
  • [27] Dilated Residual Networks
    Yu, Fisher
    Koltun, Vladlen
    Funkhouser, Thomas
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 636 - 644
  • [28] End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Hayakawa, Shoji
    Harada, Shouji
    Han, Jiqing
    INTERSPEECH 2019, 2019, : 4614 - 4618
  • [29] SPEECH ENHANCEMENT BY SEPARATION OF SOURCES IN A MIXTURE OF CONVOLUTIONS
    THI, HLN
    CAELEN, J
    JUTTEN, C
    JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 541 - 544
  • [30] Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
    Wang, Xianyun
    Bao, Changchun
    INTERSPEECH 2019, 2019, : 3188 - 3192