Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation

被引:0
|
作者
Zhao, Lei [1 ,2 ]
Zhu, Wenbo [1 ,2 ]
Li, Shengqiang [1 ,2 ]
Luo, Hong [3 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Inst Northwestern Polytech Univ, Res & Dev, Shenzhen 518063, Peoples R China
[3] China Mobile Hangzhou Informat Technol Co Ltd, Hangzhou 311199, Peoples R China
基金
美国国家科学基金会;
关键词
Reverberation; Spectrogram; Convolution; Speech recognition; Feature extraction; Deep learning; Context modeling; Multi-resolution framework; speech dereverberation; UNet; stacked convolutional blocks; ENHANCEMENT; MASKING;
D O I
10.1109/TASLP.2024.3385270
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is known that the reverberant speech in different acoustic environments varies according to reverberation time. However, most deep learning based speech dereverberation methods rely on a single deep model to learn the context information. It may make the deep model biased to only part of the reverberant time durations. In this paper, we propose a multi-resolution framework to address this issue. The framework integrates the dereverberant ability of multiple deep subnetworks with different time resolutions into a unified model by transferring the dereverberant information from high-resolution subnetworks to low-resolution subnetworks. By doing so, the unified model can perform well in both long and short reverberant time. We further propose two implementations of the framework based on advanced convolutional residual neural networks. The first implementation, named multi-resolution UNet, uses our new implementation of UNet based on convolutional blocks as the dereverberation subnetwork. The second implementation, named multi-resolution stacked convolutional blocks, uses our new stacked convolutional blocks as the subnetwork. Experimental results in both simulated and real-world environments show that the proposed algorithms outperform the state-of-the-art dereverberation methods in terms of both the evaluation metrics for speech dereverberation and word error rate (WER) for speech recognition.
引用
收藏
页码:2338 / 2351
页数:14
相关论文
共 50 条
  • [41] Dilated convolutional recurrent neural network for monaural speech enhancement
    Pirhosseinloo, Shadi
    Brumberg, Jonathan S.
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 158 - 162
  • [42] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
    Ejbali, Ridha
    Zaied, Mourad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (05) : 6149 - 6163
  • [43] A dyadic multi-resolution deep convolutional neural wavelet network for image classification
    Ridha Ejbali
    Mourad Zaied
    [J]. Multimedia Tools and Applications, 2018, 77 : 6149 - 6163
  • [44] Linear Prediction-based Dereverberation with Very Deep Convolutional Neural Networks for Reverberant Speech Recognition
    Park, Sunchan
    Jeong, Yongwon
    Kim, Min Sik
    Kim, Hyung Soon
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 310 - 311
  • [45] Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement
    Tan, Ke
    Chen, Jitong
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 189 - 198
  • [46] Speech dereverberation method with convolutional neural network and reverberation time attention
    Sun, Xingwei
    Li, Junfeng
    Yan, Yonghong
    [J]. Shengxue Xuebao/Acta Acustica, 2021, 46 (06): : 1234 - 1241
  • [47] Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
    Yuan, Weitao
    Dong, Bofei
    Wang, Shengbei
    Unoki, Masashi
    Wang, Wenwu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 807 - 822
  • [48] Plastic multi-resolution auditory model based neural network for speech enhancement
    Lai, Chen-Yen
    Lo, Yu-Wen
    Shen, Yih-Liang
    Chi, Tai-Shih
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 605 - 609
  • [49] Classification of ECG arrhythmias using multi-resolution analysis and neural networks
    Prasad, GK
    Sahambi, JS
    [J]. IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 227 - 231
  • [50] Utilizing oscillator neural networks to realize multi-resolution pattern recognition
    Lu, ZD
    Yan, PF
    [J]. 8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 192 - 196