DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引:0
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
time domain; fully convolutional; dense network; time-frequency loss; speaker- and noise-independent;
D O I
10.1109/icassp40776.2020.9054536
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed by densely connected blocks comprising of dilated and causal convolutions. The dilated convolutions help in context aggregation at different resolutions. The causal convolutions are used to avoid information flow from future frames, hence making the network suitable for real-time applications. We also propose to use sub-pixel convolutional layers in the decoder for upsampling. Further, the model is trained using a loss function with two components; a time-domain loss and a frequency-domain loss. The proposed loss function outperforms the time-domain loss. Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.
引用
收藏
页码:6629 / 6633
页数:5
相关论文
共 50 条
  • [41] A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain
    Jiang, Tao
    Liu, Hongqing
    Zhou, Yi
    Gan, Lu
    [J]. COMMUNICATIONS AND NETWORKING (CHINACOM 2021), 2022, : 129 - 139
  • [42] CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT
    Wang, Kai
    He, Bengbeng
    Zhu, Wei-Ping
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [43] Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone
    Kucuk, Abdullah
    Ganguly, Anshuman
    Hao, Yiya
    Panahi, Issa M. S.
    [J]. IEEE ACCESS, 2019, 7 : 169969 - 169978
  • [44] Speech Enhancement Method Based on Frequency-Time Dilated Dense Network
    Huang, Xiangdong
    Chen, Honghong
    Gan, Lin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (07): : 1628 - 1638
  • [45] Real-Time Codebook-based Speech Enhancement with GPUs
    Prasanna, A. N. Sai
    Gurumurthyt, Iver Chandrashekaran
    Naidu, D. H. R.
    Baruith, Pallav Kuniar
    [J]. 2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 306 - 311
  • [46] Real-time speech enhancement algorithm for transient noise suppression
    Liang, Ruiyu
    Xie, Yue
    Cheng, Jiaming
    Tang, Guichen
    Sun, Shinuo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (03) : 3681 - 3702
  • [47] A real-time kepstrum approach to speech enhancement and noise cancellation
    Jeong, J.
    Moir, T. J.
    [J]. NEUROCOMPUTING, 2008, 71 (13-15) : 2635 - 2649
  • [48] Compact deep neural networks for real-time speech enhancement on resource-limited devices
    Wahab, Fazal E.
    Ye, Zhongfu
    Saleem, Nasir
    Ullah, Rizwan
    [J]. SPEECH COMMUNICATION, 2024, 156
  • [49] Real-time speech enhancement by adaptive spectral subtraction method
    Wang, Jingfang
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 3774 - 3778
  • [50] Real-Time Speech Enhancement Algorithm Based on Attention LSTM
    Liang, Ruiyu
    Kong, Fanliu
    Xie, Yue
    Tang, Guichen
    Cheng, Jiaming
    [J]. IEEE ACCESS, 2020, 8 : 48464 - 48476