TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引:0
|
作者
Pandey, Ashutosh [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
noise-independent and speaker-independent speech enhancement; real-time implementation; time domain; temporal convolutional neural network; TCNN; NOISE;
D O I
10.1109/icassp.2019.8683634
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work proposes a fully convolutional neural network (CNN) for real-time speech enhancement in the time domain. The proposed CNN is an encoder-decoder based architecture with an additional temporal convolutional module (TCM) inserted between the encoder and the decoder. We call this architecture a Temporal Convolutional Neural Network (TCNN). The encoder in the TCNN creates a low dimensional representation of a noisy input frame. The TCM uses causal and dilated convolutional layers to utilize the encoder output of the current and previous frames. The decoder uses the TCM output to reconstruct the enhanced frame. The proposed model is trained in a speaker-and noise-independent way. Experimental results demonstrate that the proposed model gives consistently better enhancement results than a state-of-the-art real-time convolutional recurrent model. Moreover, since the model is fully convolutional, it has much fewer trainable parameters than earlier models.
引用
收藏
页码:6875 / 6879
页数:5
相关论文
共 50 条
  • [1] A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3229 - 3233
  • [2] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [3] DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6629 - 6633
  • [4] Convolutional quasi-recurrent network for real-time speech enhancement
    Shi Y.
    Yuan W.
    Hu S.
    Lou Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (03): : 183 - 190
  • [5] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
    Vuong, Tyler
    Xia, Yangyang
    Stern, Richard M.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
  • [6] Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
    Fazal-E-Wahab
    Ye, Zhongfu
    Saleem, Nasir
    Ali, Hamza
    Ali, Imad
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023,
  • [7] A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone
    Bhat, Gautam S.
    Shankar, Nikhil
    Reddy, Chandan K. A.
    Panahi, Issa M. S.
    IEEE ACCESS, 2019, 7 : 78421 - 78433
  • [8] PERFORMANCE STUDY OF A CONVOLUTIONAL TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME SPEECH DENOISING
    Sonning, Samuel
    Scheldt, Christian
    Erdogan, Hakan
    Wisdom, Scott
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 831 - 835
  • [9] Real-time neural speech enhancement based on temporal refinement network and channel-wise gating methods
    Lee, Jinyoung
    Kang, Hong-Goo
    DIGITAL SIGNAL PROCESSING, 2023, 133
  • [10] Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone
    Kucuk, Abdullah
    Ganguly, Anshuman
    Hao, Yiya
    Panahi, Issa M. S.
    IEEE ACCESS, 2019, 7 : 169969 - 169978