DENSELY CONNECTED NEURAL NETWORK WITH DILATED CONVOLUTIONS FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN

被引：0

作者：

Pandey, Ashutosh ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

time domain; fully convolutional; dense network; time-frequency loss; speaker- and noise-independent;

D O I：

10.1109/icassp40776.2020.9054536

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed by densely connected blocks comprising of dilated and causal convolutions. The dilated convolutions help in context aggregation at different resolutions. The causal convolutions are used to avoid information flow from future frames, hence making the network suitable for real-time applications. We also propose to use sub-pixel convolutional layers in the decoder for upsampling. Further, the model is trained using a loss function with two components; a time-domain loss and a frequency-domain loss. The proposed loss function outperforms the time-domain loss. Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.

引用

页码：6629 / 6633

页数：5

共 50 条

[1] DENSELY CONNECTED NETWORK WITH TIME-FREQUENCY DILATED CONVOLUTION FOR SPEECH ENHANCEMENT
Li, Yaxing
Li, Xiaoqi
Dong, Yuanjie
Li, Meng
Xu, Shan
Xiong, Shengwu
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6860 - 6864
[2] TCNN: TEMPORAL CONVOLUTIONAL NEURAL NETWORK FOR REAL-TIME SPEECH ENHANCEMENT IN THE TIME DOMAIN
Pandey, Ashutosh
Wang, DeLiang
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6875 - 6879
[3] DCT based densely connected convolutional GRU for real-time speech enhancement
Jannu, Chaitanya
Vanambathina, Sunny Dayal
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 1195 - 1208
[4] A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
Tan, Ke
Wang, DeLiang
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3229 - 3233
[5] A MODULATION-DOMAIN LOSS FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
Vuong, Tyler
Xia, Yangyang
Stern, Richard M.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6643 - 6647
[6] Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules
Vanambathina, Sunny Dayal
Burra, Manaswini
Edupalli, Bhumika
Vallem, Eswar Reddy
Nellore, Venkata Sravani
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 50289 - 50305
[7] Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules
Sunny Dayal Vanambathina
Manaswini Burra
Bhumika Edupalli
Eswar Reddy Vallem
Venkata Sravani Nellore
[J]. Multimedia Tools and Applications, 2024, 83 : 50289 - 50305
[8] Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
Girirajan, S.
Pandian, A.
[J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 1987 - 2001
[9] DENSELY CONNECTED MULTI-STAGE MODEL WITH CHANNEL WISE SUBBAND FEATURE FOR REAL-TIME SPEECH ENHANCEMENT
Li, Jingdong
Luo, Dawei
Liu, Yun
Zhu, Yuanyuan
Li, Zhaoxia
Cui, Guohui
Tang, Wenqi
Chen, Wei
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6638 - 6642
[10] WEIGHTED SPEECH DISTORTION LOSSES FOR NEURAL-NETWORK-BASED REAL-TIME SPEECH ENHANCEMENT
Xia, Yangyang
Braun, Sebastian
Reddy, Chandan K. A.
Dubey, Harishchandra
Cutler, Ross
Tashev, Ivan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 871 - 875

← 1 2 3 4 5 →