DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引：0

作者：

Wang, Zhenyu ^{[1
,2
,4
]}

Zhou, Yi ^{[1
,2
]}

Gan, Lu ^{[3
,4
]}

Chen, Rilin

Tang, Xinyu ^{[1
,2
]}

Liu, Hongqing ^{[1
,2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China

[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China

[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England

[4] Tencent AI Lab, Beijing, Peoples R China

来源：

2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2022年

关键词：

Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;

D O I：

10.1109/SIPS55645.2022.9919247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.

引用

页码：180 / 184

页数：5

共 50 条

[21] Hybrid dual-path network: Singing voice separation in the waveform domain by combining Conformer and Transformer architectures
Wang, Chunxi
Jia, Maoshen
Li, Meiran
Ma, Yong
Yao, Dingding
SPEECH COMMUNICATION, 2025, 168
[22] A multi-channel deep convolutional neural network for multi-classifying thyroid diseases
Zhang, Xinyu
Lee, Vincent C. S.
Rong, Jia
Lee, James C.
Song, Jiangning
Liu, Feng
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 148
[23] Multi-channel Speech Separation Using Deep Embedding With Multilayer Bootstrap Networks
Yang, Ziye
Zhang, Xiao-Lei
Fu, Zhonghua
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 716 - 719
[24] Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
Ye, Moujia
Wan, Hongjie
ENTROPY, 2023, 25 (02)
[25] Epileptic Seizure Detection for Multi-channel EEG with Deep Convolutional Neural Network
Park, Chulkyun
Choi, Gwangho
Kim, Junkyung
Kim, Sangdeok
Kim, Tae-Loon
Min, Kyeongyuk
Jung, Ki-Young
Chong, Jongwha
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 518 - 522
[26] Bearing Fault Diagnosis with Multi-Channel Sample and Deep Convolutional Neural Network
Zhang H.
Yuan Q.
Zhao B.
Niu G.
Yuan, Qi, 1600, Xi'an Jiaotong University (54): : 58 - 66
[27] Multi-Head Attention Time Domain Audiovisual Speech Separation Based on Dual-Path Recurrent Network and Conv-TasNet
Lan C.
Jiang P.
Chen H.
Zhao S.
Guo X.
Han Y.
Han C.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1005 - 1012
[28] Retinal vessel segmentation method based on multi-scale dual-path convolutional neural network
Fang, Tao
Fang, Linling
INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2024, 13 (02)
[29] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
Togami, Masahito
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036
[30] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
Zhang, Guochang
Wang, Chunliang
Yu, Libiao
Wei, Jianqiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210

← 1 2 3 4 5 →