WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-End Speech Enhancement

被引:53
|
作者
Hsieh, Tsun-An [1 ]
Wang, Hsin-Min [2 ]
Lu, Xugang [3 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 11529, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 11529, Taiwan
[3] NICT, Koganei, Tokyo 1848795, Japan
关键词
Speech enhancement; Feature extraction; Task analysis; Noise reduction; Convolution; Noise measurement; Training; Compressed speech restoration; convolutional recurrent neural networks; raw waveform speech enhancement; simple recurrent unit; DEEP; DOMAIN; SEPARATION;
D O I
10.1109/LSP.2020.3040693
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the local and sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered or are too complex to be realized. In this letter, we propose an efficient E2E SE model, termed WaveCRN. Compared with models based on convolutional neural networks (CNN) or long short-term memory (LSTM), WaveCRN uses a CNN module to capture the speech locality features and a stacked simple recurrent units (SRU) module to model the sequential property of the locality features. Different from conventional recurrent neural networks and LSTM, SRU can be efficiently parallelized in calculation, with even fewer model parameters. In order to more effectively suppress noise components in the noisy speech, we derive a novel restricted feature masking approach, which performs enhancement on the feature maps in the hidden layers; this is different from the approaches that apply the estimated ratio mask to the noisy spectral features, which is commonly used in speech separation methods. Experimental results on speech denoising and compressed speech restoration tasks confirm that with the SRU and the restricted feature map, WaveCRN performs comparably to other state-of-the-art approaches with notably reduced model complexity and inference time.
引用
收藏
页码:2149 / 2153
页数:5
相关论文
共 50 条
  • [21] Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks
    Zhang, Wei
    Zhai, Minghao
    Huang, Zilong
    Liu, Chen
    Li, Wei
    Cao, Yi
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PART VI, 2019, 11745 : 332 - 341
  • [22] End-to-End Unsupervised Deformable Image Registration with a Convolutional Neural Network
    de Vos, Bob D.
    Berendsen, Floris F.
    Viergever, Max A.
    Staring, Marius
    Isgum, Ivana
    DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 : 204 - 212
  • [23] End-to-End PSK Signals Demodulation Using Convolutional Neural Network
    Chen, Wen-Jie
    Wang, Jiao
    Li, Jian-Qing
    IEEE ACCESS, 2022, 10 : 58302 - 58310
  • [24] An End-to-End Dense Connected Heterogeneous Graph Convolutional Neural Network
    Yan, Ranhui
    Cai, Jia
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 462 - 475
  • [25] An End-To-End Hyperbolic Deep Graph Convolutional Neural Network Framework
    Zhou, Yuchen
    Huo, Hongtao
    Hou, Zhiwen
    Bu, Lingbin
    Wang, Yifan
    Mao, Jingyi
    Lv, Xiaojun
    Bu, Fanliang
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (01): : 537 - 563
  • [26] Image reflection removal using end-to-end convolutional neural network
    Li, Jinjiang
    Li, Guihui
    Fan, Hui
    IET IMAGE PROCESSING, 2020, 14 (06) : 1047 - 1058
  • [27] End-to-End Musical Key Estimation Using a Convolutional Neural Network
    Korzeniowski, Filip
    Widmer, Gerhard
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 966 - 970
  • [28] Lightweight end-to-end image steganalysis based on convolutional neural network
    Wang, Qun
    Zhang, Minqing
    Li, Jun
    Kong, Yongjun
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (06)
  • [29] End-to-End Multispectral Image Compression Using Convolutional Neural Network
    Kong Fanqiang
    Zhou Yongbo
    Shen Qiu
    Wen Keyao
    CHINESE JOURNAL OF LASERS-ZHONGGUO JIGUANG, 2019, 46 (10):
  • [30] ON TRAINING THE RECURRENT NEURAL NETWORK ENCODER-DECODER FOR LARGE VOCABULARY END-TO-END SPEECH RECOGNITION
    Lu, Liang
    Zhang, Xingxing
    Renals, Steve
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5060 - 5064