Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments

被引：0

作者：

Chunxi Wang

Maoshen Jia

Xinfeng Zhang

机构：

[1] Beijing University of Technology,Faculty of Information Technology

来源：

EURASIP Journal on Audio, Speech, and Music Processing | / 2023卷

关键词：

Speech separation; Deep learning; Speech enhancement; SISNR;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.

引用

共 50 条

[21] Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation
Yang, Xue
Bao, Changchun
INTERSPEECH 2022, 2022, : 5338 - 5342
[22] LesionScanNet: dual-path convolutional neural network for acute appendicitis diagnosis
Hariri, Muhab
Aydin, Ahmet
Sibic, Osman
Somuncu, Erkan
Yilmaz, Serhan
Sonmez, Suleyman
Avsar, Ercan
HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 13 (01):
[23] A Dual-Path Neural Network for High-Impedance Fault Detection
Ning, Keqing
Ye, Lin
Song, Wei
Guo, Wei
Li, Guanyuan
Yin, Xiang
Zhang, Mingze
MATHEMATICS, 2025, 13 (02)
[24] DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement
Le, Xiaohuai
Chen, Hongsheng
Chen, Kai
Lu, Jing
INTERSPEECH 2021, 2021, : 2811 - 2815
[25] Speech Enhancement Based on Dual-Path Cross-Parallel Conformer Network
Zhao, Qing
Gao, Ying
Cai, Zhuoran
Ou, Shifeng
IEEE ACCESS, 2024, 12 : 198201 - 198211
[26] A dual path encoder-decoder network for placental vessel segmentation in fetoscopic surgery
Rao, Yunbo
Tan, Tian
Zeng, Shaoning
Chen, Zhanglin
Sun, Jihong
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (01): : 15 - 29
[27] Deep Encoder-Decoder Neural Network Architectures for Graph Output Signals
Rey, Samuel
Tenorio, Victor
Rozada, Sergio
Martino, Luca
Marques, Antonio G.
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 225 - 229
[28] An encoder-decoder deep neural network for binary segmentation of seismic facies
Lima, Gefersom
Zeiser, Felipe Andre
Da Silveira, Ariane
Rigo, Sandro
Ramos, Gabriel de Oliveira
COMPUTERS & GEOSCIENCES, 2024, 183
[29] Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments
Saleem, Nasir
Khattak, Muhammad Irfan
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2020, 6 (01): : 84 - 90
[30] A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition
Lu, Liang
Zhang, Xingxing
Cho, Kyunghyun
Renals, Steve
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3249 - 3253

← 1 2 3 4 5 →