Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation

被引:0
|
作者
Zhao, Lei [1 ,2 ]
Zhu, Wenbo [1 ,2 ]
Li, Shengqiang [1 ,2 ]
Luo, Hong [3 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Inst Northwestern Polytech Univ, Res & Dev, Shenzhen 518063, Peoples R China
[3] China Mobile Hangzhou Informat Technol Co Ltd, Hangzhou 311199, Peoples R China
基金
美国国家科学基金会;
关键词
Reverberation; Spectrogram; Convolution; Speech recognition; Feature extraction; Deep learning; Context modeling; Multi-resolution framework; speech dereverberation; UNet; stacked convolutional blocks; ENHANCEMENT; MASKING;
D O I
10.1109/TASLP.2024.3385270
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
It is known that the reverberant speech in different acoustic environments varies according to reverberation time. However, most deep learning based speech dereverberation methods rely on a single deep model to learn the context information. It may make the deep model biased to only part of the reverberant time durations. In this paper, we propose a multi-resolution framework to address this issue. The framework integrates the dereverberant ability of multiple deep subnetworks with different time resolutions into a unified model by transferring the dereverberant information from high-resolution subnetworks to low-resolution subnetworks. By doing so, the unified model can perform well in both long and short reverberant time. We further propose two implementations of the framework based on advanced convolutional residual neural networks. The first implementation, named multi-resolution UNet, uses our new implementation of UNet based on convolutional blocks as the dereverberation subnetwork. The second implementation, named multi-resolution stacked convolutional blocks, uses our new stacked convolutional blocks as the subnetwork. Experimental results in both simulated and real-world environments show that the proposed algorithms outperform the state-of-the-art dereverberation methods in terms of both the evaluation metrics for speech dereverberation and word error rate (WER) for speech recognition.
引用
收藏
页码:2338 / 2351
页数:14
相关论文
共 50 条
  • [1] Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
    Grais, Emad M.
    Wierstorf, Hagen
    Ward, Dominic
    Plumbley, Mark D.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 340 - 350
  • [2] Monaural Speech Dereverberation Using Deformable Convolutional Networks
    Kothapally, Vinay
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1712 - 1723
  • [3] UTTERANCE WEIGHTED MULTI-DILATION TEMPORAL CONVOLUTIONAL NETWORKS FOR MONAURAL SPEECH DEREVERBERATION
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    [J]. 2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [4] Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 80 - 84
  • [5] Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention
    Zhao, Yan
    Wang, DeLiang
    Xu, Buye
    Zhang, Tao
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1598 - 1607
  • [6] Multi-resolution convolutional neural networks for inverse problems
    Wang, Feng
    Eljarrat, Alberto
    Mueller, Johannes
    Henninen, Trond R.
    Erni, Rolf
    Koch, Christoph T.
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [7] Multi-resolution convolutional neural networks for inverse problems
    Feng Wang
    Alberto Eljarrat
    Johannes Müller
    Trond R. Henninen
    Rolf Erni
    Christoph T. Koch
    [J]. Scientific Reports, 10
  • [8] Multi-Resolution for Disparity Estimation with Convolutional Neural Networks
    Jammal, Samer
    Tillo, Tammam
    Xiao, Jimin
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1756 - 1761
  • [9] Multi-Resolution Convolutional Recurrent Networks
    Chien, Jen-Tzung
    Huang, Yu-Min
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2043 - 2048
  • [10] An interactive instance segmentation system with multi-resolution convolutional neural networks
    Sung, Po-Wei
    Yang, Wei-Jong
    Yang, Jar-Ferr
    Chan, Din-Yuan
    [J]. IET COMPUTER VISION, 2021, 15 (02) : 99 - 109