DCTCN:Deep Complex Temporal Convolutional Network for Long Time Speech Enhancement

被引:1
|
作者
Ren, Jigang [1 ]
Mao, Qirong [1 ,2 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Jiangsu, Peoples R China
[2] Jiangsu Key Lab Secur Tech Industrail Cyberspace, Zhenjiang, Jiangsu, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
speech enhancement; complex temporal convolution network; deep learning; selective kernel network;
D O I
10.21437/Interspeech.2022-11269
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, with the rapid development of deep learning, the performance of Monaural speech enhancement (SE) in terms of intelligibility and speech quality has been significantly improved. In time-frequency (TF) domain, we generally use convolutional neural networks (CNN) to predict the mask from the noisy amplitude spectrum to the pure amplitude spectrum. Deep complex convolution recurrent network (DCCRN) uses the algorithm of complex numbers to process convolutional networks and long short-term memory (LSTM), and has achieved good results. However, LSTM can only model short time frames, and its performance is often not good enough when processing information on longer time frames. The single convolution kernel size of encoder-deocder also limits the ability of model to extract and restore features. In this paper, we design a new network to handle these problems, called Deep Complex Temporal Convolutional Network (DCTCN), where temporal convolution network (TCN) using the rule of complex calculation. The Encoder and Decoder use selective kernel network (SkNet) to capture multi-scale receptive field in the encoding and decoding phase. Compared with DCCRN, the proposed DCTCN can be more effective in modeling long time series, and SKNet can extract and restore more fine-grained features. On the TIMIT and VoiceBank+DEMAND datasets, our model obtains very competitive results compared with previous models.
引用
收藏
页码:5478 / 5482
页数:5
相关论文
共 50 条
  • [1] Complex-valued temporal convolutional network for speech enhancement
    Song, Jiaqi
    Zou, Lian
    Zhou, Liqing
    Liu, Ziao
    Fan, Cien
    Wang, Bin
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2024, 22 (05)
  • [2] Speech enhancement using deep complex convolutional neural network (DCCNN) model
    Iqbal, Yasir
    Zhang, Tao
    Fahad, Muhammad
    Rahman, Sadiq ur
    Iqbal, Anjum
    Geng, Yanzhang
    Zhao, Xin
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 8675 - 8692
  • [3] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Pandey, Ashutosh
    Wang, DeLiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
  • [4] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [5] DEEP COMPLEX CONVOLUTIONAL RECURRENT NETWORK FOR MULTI-CHANNEL SPEECH ENHANCEMENT AND DEREVERBERATION
    Gelderblom, Femke B.
    Myrvoll, Tor Andre
    [J]. 2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [6] Speech Enhancement of Complex Convolutional Recurrent Network with Attention
    Zeng, Jiangjiao
    Yang, Lidong
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 42 (3) : 1834 - 1847
  • [7] Speech Enhancement of Complex Convolutional Recurrent Network with Attention
    Jiangjiao Zeng
    Lidong Yang
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 1834 - 1847
  • [8] Convolutional Deep Neural Network and Full Connectivity for Speech Enhancement
    Alameri, Ban M.
    Kadhim, Inas Jawad
    Hadi, Suha Qasim
    Hassoon, Ali F.
    Abd, Mustafa M.
    Premaratne, Prashan
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 140 - 154
  • [9] Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
    Zhang, Qiquan
    Song, Qi
    Nicolson, Aaron
    Lan, Tian
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 166 - 170
  • [10] COMPLEX SPECTRAL MAPPING WITH A CONVOLUTIONAL RECURRENT NETWORK FOR MONAURAL SPEECH ENHANCEMENT
    Tan, Ke
    Wang, DeLiang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6865 - 6869