Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

被引:0
|
作者
Chunxi Wang
Maoshen Jia
Xinfeng Zhang
机构
[1] Beijing University of Technology,Faculty of Information Technology
关键词
Speech separation; Deep learning; Speech enhancement; SISNR;
D O I
暂无
中图分类号
学科分类号
摘要
In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.
引用
收藏
相关论文
共 50 条
  • [1] Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
    Wang, Chunxi
    Jia, Maoshen
    Zhang, Xinfeng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [2] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
    Wang, Zhenyu
    Zhou, Yi
    Gan, Lu
    Chen, Rilin
    Tang, Xinyu
    Liu, Hongqing
    2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
  • [3] Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement
    Wang, Junyu
    INTERSPEECH 2023, 2023, : 2853 - 2857
  • [4] Dual-Path Hybrid Attention Network for Monaural Speech Separation
    Qiu, Wenbo
    Hu, Ying
    IEEE ACCESS, 2022, 10 : 78754 - 78763
  • [5] Light-weight speech separation based on dual-path attention and recurrent neural network
    Yang Y.
    Hu Q.
    Zhang P.
    Shengxue Xuebao/Acta Acustica, 2023, 48 (05): : 1060 - 1069
  • [6] SEformer: Dual-Path Conformer Neural Network is a Good Speech Denoiser
    Wang, Kai
    Hatzinakos, Dimitrios
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 934 - 940
  • [7] Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation
    Wang, Fan-Lin
    Peng, Yu-Huai
    Lee, Hung-Shin
    Wang, Hsin-Min
    INTERSPEECH 2021, 2021, : 3061 - 3065
  • [8] DUAL-PATH RNN FOR LONG RECORDING SPEECH SEPARATION
    Li, Chenda
    Luo, Yi
    Han, Cong
    Li, Jinyu
    Yoshioka, Takuya
    Zhou, Tianyan
    Delcroix, Marc
    Kinoshita, Keisuke
    Boeddeker, Christoph
    Qian, Yanmin
    Watanabe, Shinji
    Chen, Zhuo
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 865 - 872
  • [9] Deep encoder and decoder for time-domain speech separation
    Takahashi, Kohei
    Shiraishi, Toshihiko
    MECHANICAL ENGINEERING JOURNAL, 2023, 10 (05):
  • [10] DCE-CDPPTnet: Dense Connected Encoder Cross Dual-path Parrel Transformer Network for Multi-channel Speech Separation
    Zhuang, Chenghao
    Zhou, Lin
    Cao, Yanxiang
    Wang, Qirui
    Cheng, Yunling
    2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 303 - 308