Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation

被引：0

作者：

Zhao, Lei ^{[1
,2
]}

Zhu, Wenbo ^{[1
,2
]}

Li, Shengqiang ^{[1
,2
]}

Luo, Hong ^{[3
]}

Zhang, Xiao-Lei ^{[1
,2
]}

Rahardja, Susanto ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Inst Northwestern Polytech Univ, Res & Dev, Shenzhen 518063, Peoples R China

[3] China Mobile Hangzhou Informat Technol Co Ltd, Hangzhou 311199, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

美国国家科学基金会;

关键词：

Reverberation; Spectrogram; Convolution; Speech recognition; Feature extraction; Deep learning; Context modeling; Multi-resolution framework; speech dereverberation; UNet; stacked convolutional blocks; ENHANCEMENT; MASKING;

D O I：

10.1109/TASLP.2024.3385270

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

It is known that the reverberant speech in different acoustic environments varies according to reverberation time. However, most deep learning based speech dereverberation methods rely on a single deep model to learn the context information. It may make the deep model biased to only part of the reverberant time durations. In this paper, we propose a multi-resolution framework to address this issue. The framework integrates the dereverberant ability of multiple deep subnetworks with different time resolutions into a unified model by transferring the dereverberant information from high-resolution subnetworks to low-resolution subnetworks. By doing so, the unified model can perform well in both long and short reverberant time. We further propose two implementations of the framework based on advanced convolutional residual neural networks. The first implementation, named multi-resolution UNet, uses our new implementation of UNet based on convolutional blocks as the dereverberation subnetwork. The second implementation, named multi-resolution stacked convolutional blocks, uses our new stacked convolutional blocks as the subnetwork. Experimental results in both simulated and real-world environments show that the proposed algorithms outperform the state-of-the-art dereverberation methods in terms of both the evaluation metrics for speech dereverberation and word error rate (WER) for speech recognition.

引用

页码：2338 / 2351

页数：14

共 50 条

[41] Dilated convolutional recurrent neural network for monaural speech enhancement
Pirhosseinloo, Shadi
Brumberg, Jonathan S.
[J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
[42] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
Ejbali, Ridha
Zaied, Mourad
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (05) : 6149 - 6163
[43] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
Ridha Ejbali
Mourad Zaied
[J]. Multimedia Tools and Applications, 2018, 77 : 6149 - 6163
[44] Linear Prediction-based Dereverberation with Very Deep Convolutional Neural Networks for Reverberant Speech Recognition
Park, Sunchan
Jeong, Yongwon
Kim, Min Sik
Kim, Hyung Soon
[J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 310 - 311
[45] Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement
Tan, Ke
Chen, Jitong
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 189 - 198
[46] Speech dereverberation method with convolutional neural network and reverberation time attention
Sun, Xingwei
Li, Junfeng
Yan, Yonghong
[J]. Shengxue Xuebao/Acta Acustica, 2021, 46 (06): : 1234 - 1241
[47] Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
Yuan, Weitao
Dong, Bofei
Wang, Shengbei
Unoki, Masashi
Wang, Wenwu
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 807 - 822
[48] Plastic multi-resolution auditory model based neural network for speech enhancement
Lai, Chen-Yen
Lo, Yu-Wen
Shen, Yih-Liang
Chi, Tai-Shih
[J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 605 - 609
[49] Classification of ECG arrhythmias using multi-resolution analysis and neural networks
Prasad, GK
Sahambi, JS
[J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 227 - 231
[50] Utilizing oscillator neural networks to realize multi-resolution pattern recognition
Lu, ZD
Yan, PF
[J]. 8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 192 - 196

← 1 2 3 4 5 →