Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

被引:0
|
作者
Chunxi Wang
Maoshen Jia
Xinfeng Zhang
机构
[1] Beijing University of Technology,Faculty of Information Technology
关键词
Speech separation; Deep learning; Speech enhancement; SISNR;
D O I
暂无
中图分类号
学科分类号
摘要
In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.
引用
收藏
相关论文
共 50 条
  • [21] Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation
    Yang, Xue
    Bao, Changchun
    INTERSPEECH 2022, 2022, : 5338 - 5342
  • [22] LesionScanNet: dual-path convolutional neural network for acute appendicitis diagnosis
    Hariri, Muhab
    Aydin, Ahmet
    Sibic, Osman
    Somuncu, Erkan
    Yilmaz, Serhan
    Sonmez, Suleyman
    Avsar, Ercan
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 13 (01):
  • [23] A Dual-Path Neural Network for High-Impedance Fault Detection
    Ning, Keqing
    Ye, Lin
    Song, Wei
    Guo, Wei
    Li, Guanyuan
    Yin, Xiang
    Zhang, Mingze
    MATHEMATICS, 2025, 13 (02)
  • [24] DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
    Le, Xiaohuai
    Chen, Hongsheng
    Chen, Kai
    Lu, Jing
    INTERSPEECH 2021, 2021, : 2811 - 2815
  • [25] Speech Enhancement Based on Dual-Path Cross-Parallel Conformer Network
    Zhao, Qing
    Gao, Ying
    Cai, Zhuoran
    Ou, Shifeng
    IEEE ACCESS, 2024, 12 : 198201 - 198211
  • [26] A dual path encoder-decoder network for placental vessel segmentation in fetoscopic surgery
    Rao, Yunbo
    Tan, Tian
    Zeng, Shaoning
    Chen, Zhanglin
    Sun, Jihong
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (01): : 15 - 29
  • [27] Deep Encoder-Decoder Neural Network Architectures for Graph Output Signals
    Rey, Samuel
    Tenorio, Victor
    Rozada, Sergio
    Martino, Luca
    Marques, Antonio G.
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 225 - 229
  • [28] An encoder-decoder deep neural network for binary segmentation of seismic facies
    Lima, Gefersom
    Zeiser, Felipe Andre
    Da Silveira, Ariane
    Rigo, Sandro
    Ramos, Gabriel de Oliveira
    COMPUTERS & GEOSCIENCES, 2024, 183
  • [29] Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments
    Saleem, Nasir
    Khattak, Muhammad Irfan
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (01): : 84 - 90
  • [30] A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition
    Lu, Liang
    Zhang, Xingxing
    Cho, Kyunghyun
    Renals, Steve
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3249 - 3253