DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引:0
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
time domain; fully convolutional; dense network; time-frequency loss; speaker- and noise-independent;
D O I
10.1109/icassp40776.2020.9054536
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed by densely connected blocks comprising of dilated and causal convolutions. The dilated convolutions help in context aggregation at different resolutions. The causal convolutions are used to avoid information flow from future frames, hence making the network suitable for real-time applications. We also propose to use sub-pixel convolutional layers in the decoder for upsampling. Further, the model is trained using a loss function with two components; a time-domain loss and a frequency-domain loss. The proposed loss function outperforms the time-domain loss. Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.
引用
收藏
页码:6629 / 6633
页数:5
相关论文
共 50 条
  • [1] DENSELY CONNECTED NETWORK WITH TIME-FREQUENCY DILATED CONVOLUTION FOR SPEECH ENHANCEMENT
    Li, Yaxing
    Li, Xiaoqi
    Dong, Yuanjie
    Li, Meng
    Xu, Shan
    Xiong, Shengwu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6860 - 6864
  • [2] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [3] DCT based densely connected convolutional GRU for real-time speech enhancement
    Jannu, Chaitanya
    Vanambathina, Sunny Dayal
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 1195 - 1208
  • [4] A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3229 - 3233
  • [5] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
  • [6] Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules
    Vanambathina, Sunny Dayal
    Burra, Manaswini
    Edupalli, Bhumika
    Vallem, Eswar Reddy
    Nellore, Venkata Sravani
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 50289 - 50305
  • [7] Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules
    Sunny Dayal Vanambathina
    Manaswini Burra
    Bhumika Edupalli
    Eswar Reddy Vallem
    Venkata Sravani Nellore
    [J]. Multimedia Tools and Applications, 2024, 83 : 50289 - 50305
  • [8] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [9] DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT
    Li, Jingdong
    Luo, Dawei
    Liu, Yun
    Zhu, Yuanyuan
    Li, Zhaoxia
    Cui, Guohui
    Tang, Wenqi
    Chen, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6638 - 6642
  • [10] WEIGHTED SPEECH DISTORTION LOSSES FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Xia, Yangyang
    Braun, Sebastian
    Reddy, Chandan K. A.
    Dubey, Harishchandra
    Cutler, Ross
    Tashev, Ivan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 871 - 875