WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement

被引：53

作者：

Hsieh, Tsun-An ^{[1
]}

Wang, Hsin-Min ^{[2
]}

Lu, Xugang ^{[3
]}

Tsao, Yu ^{[1
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 11529, Taiwan

[2] Acad Sinica, Inst Informat Sci, Taipei 11529, Taiwan

[3] NICT, Koganei, Tokyo 1848795, Japan

来源：

IEEE SIGNAL PROCESSING LETTERS | 2020年 / 27卷 / 27期

关键词：

Speech enhancement; Feature extraction; Task analysis; Noise reduction; Convolution; Noise measurement; Training; Compressed speech restoration; convolutional recurrent neural networks; raw waveform speech enhancement; simple recurrent unit; DEEP; DOMAIN; SEPARATION;

D O I：

10.1109/LSP.2020.3040693

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the local and sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered or are too complex to be realized. In this letter, we propose an efficient E2E SE model, termed WaveCRN. Compared with models based on convolutional neural networks (CNN) or long short-term memory (LSTM), WaveCRN uses a CNN module to capture the speech locality features and a stacked simple recurrent units (SRU) module to model the sequential property of the locality features. Different from conventional recurrent neural networks and LSTM, SRU can be efficiently parallelized in calculation, with even fewer model parameters. In order to more effectively suppress noise components in the noisy speech, we derive a novel restricted feature masking approach, which performs enhancement on the feature maps in the hidden layers; this is different from the approaches that apply the estimated ratio mask to the noisy spectral features, which is commonly used in speech separation methods. Experimental results on speech denoising and compressed speech restoration tasks confirm that with the SRU and the restricted feature map, WaveCRN performs comparably to other state-of-the-art approaches with notably reduced model complexity and inference time.

引用

页码：2149 / 2153

页数：5

共 50 条

[21] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
Zhang, Wei
Zhai, Minghao
Huang, Zilong
Liu, Chen
Li, Wei
Cao, Yi
INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341
[22] End-to-End Unsupervised Deformable Image Registration with a Convolutional Neural Network
de Vos, Bob D.
Berendsen, Floris F.
Viergever, Max A.
Staring, Marius
Isgum, Ivana
DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 : 204 - 212
[23] End-to-End PSK Signals Demodulation Using Convolutional Neural Network
Chen, Wen-Jie
Wang, Jiao
Li, Jian-Qing
IEEE ACCESS, 2022, 10 : 58302 - 58310
[24] An End-to-End Dense Connected Heterogeneous Graph Convolutional Neural Network
Yan, Ranhui
Cai, Jia
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 462 - 475
[25] An End-To-End Hyperbolic Deep Graph Convolutional Neural Network Framework
Zhou, Yuchen
Huo, Hongtao
Hou, Zhiwen
Bu, Lingbin
Wang, Yifan
Mao, Jingyi
Lv, Xiaojun
Bu, Fanliang
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (01): : 537 - 563
[26] Image reflection removal using end-to-end convolutional neural network
Li, Jinjiang
Li, Guihui
Fan, Hui
IET IMAGE PROCESSING, 2020, 14 (06) : 1047 - 1058
[27] End-to-End Musical Key Estimation Using a Convolutional Neural Network
Korzeniowski, Filip
Widmer, Gerhard
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 966 - 970
[28] Lightweight end-to-end image steganalysis based on convolutional neural network
Wang, Qun
Zhang, Minqing
Li, Jun
Kong, Yongjun
JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (06)
[29] End-to-End Multispectral Image Compression Using Convolutional Neural Network
Kong Fanqiang
Zhou Yongbo
Shen Qiu
Wen Keyao
CHINESE JOURNAL OF LASERS-ZHONGGUO JIGUANG, 2019, 46 (10):
[30] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
Lu, Liang
Zhang, Xingxing
Renals, Steve
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064

← 1 2 3 4 5 →