DCTCN:Deep Complex Temporal Convolutional Network for Long Time Speech Enhancement

被引:1
|
作者
Ren, Jigang [1 ]
Mao, Qirong [1 ,2 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China
[2] Jiangsu Key Lab Secur Tech Industrail Cyberspace, Zhenjiang, Jiangsu, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
speech enhancement; complex temporal convolution network; deep learning; selective kernel network;
D O I
10.21437/Interspeech.2022-11269
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, with the rapid development of deep learning, the performance of Monaural speech enhancement (SE) in terms of intelligibility and speech quality has been significantly improved. In time-frequency (TF) domain, we generally use convolutional neural networks (CNN) to predict the mask from the noisy amplitude spectrum to the pure amplitude spectrum. Deep complex convolution recurrent network (DCCRN) uses the algorithm of complex numbers to process convolutional networks and long short-term memory (LSTM), and has achieved good results. However, LSTM can only model short time frames, and its performance is often not good enough when processing information on longer time frames. The single convolution kernel size of encoder-deocder also limits the ability of model to extract and restore features. In this paper, we design a new network to handle these problems, called Deep Complex Temporal Convolutional Network (DCTCN), where temporal convolution network (TCN) using the rule of complex calculation. The Encoder and Decoder use selective kernel network (SkNet) to capture multi-scale receptive field in the encoding and decoding phase. Compared with DCCRN, the proposed DCTCN can be more effective in modeling long time series, and SKNet can extract and restore more fine-grained features. On the TIMIT and VoiceBank+DEMAND datasets, our model obtains very competitive results compared with previous models.
引用
收藏
页码:5478 / 5482
页数:5
相关论文
共 50 条
  • [21] A Convolutional Gated Recurrent Network for Speech Enhancement
    Yuan W.-H.
    Hu S.-D.
    Shi Y.-L.
    Li Z.
    Liang C.-Y.
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2020, 48 (07): : 1276 - 1283
  • [22] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    [J]. NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [23] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
    Girirajan, S.
    Pandian, A.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
  • [24] Convolutional quasi-recurrent network for real-time speech enhancement
    Shi Y.
    Yuan W.
    Hu S.
    Lou Y.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (03): : 183 - 190
  • [25] Temporal Convolutional Network for Speech Bandwidth Extension
    Chundong Xu
    Cheng Zhu
    Xianpeng Ling
    Dongwen Ying
    [J]. China Communications, 2023, 20 (11) : 142 - 150
  • [26] Temporal Convolutional Network for Speech Bandwidth Extension
    Xu, Chundong
    Zhu, Cheng
    Ling, Xianpeng
    Ying, Dongwen
    [J]. CHINA COMMUNICATIONS, 2023, 20 (11) : 142 - 150
  • [27] A ChannelWise weighting technique of slice-based Temporal Convolutional Network for noisy speech enhancement
    Hong, Wei-Tyng
    Rana, Kuldeep Singh
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [28] Deep Spatio-temporal Convolutional Long-short Memory Network
    Qin C.
    Gao X.-G.
    Wan K.-F.
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (03): : 451 - 462
  • [29] NSE-CATNet: Deep Neural Speech Enhancement Using Convolutional Attention Transformer Network
    Saleem, Nasir
    Gunawan, Teddy Surya
    Kartiwi, Mira
    Nugroho, Bambang Setia
    Wijayanto, Inung
    [J]. IEEE ACCESS, 2023, 11 : 66979 - 66994
  • [30] Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (06) : 1576 - 1590