DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引：0

作者：

Wang, Zhenyu ^{[1
,2
,4
]}

Zhou, Yi ^{[1
,2
]}

Gan, Lu ^{[3
,4
]}

Chen, Rilin

Tang, Xinyu ^{[1
,2
]}

Liu, Hongqing ^{[1
,2
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China

[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China

[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England

[4] Tencent AI Lab, Beijing, Peoples R China

来源：

2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2022年

关键词：

Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;

D O I：

10.1109/SIPS55645.2022.9919247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.

引用

页码：180 / 184

页数：5

共 50 条

[41] Implicit Filter-and-sum Network for End-to-end Multi-channel Speech Separation
Luo, Yi
Mesgarani, Nima
INTERSPEECH 2021, 2021, : 3071 - 3075
[42] Multi-channel Image Registration of Cardiac MR Using Supervised Feature Learning with Convolutional Encoder-Decoder Network
Lu, Xuesong
Qiao, Yuchuan
BIOMEDICAL IMAGE REGISTRATION (WBIR 2020), 2020, 12120 : 103 - 110
[43] Multi-Channel EEG Emotion Recognition Based on Parallel Transformer and 3D-Convolutional Neural Network
Sun, Jie
Wang, Xuan
Zhao, Kun
Hao, Siyuan
Wang, Tianyu
MATHEMATICS, 2022, 10 (17)
[44] Multi-Channel Audio Source Separation Using Azimuth-Frequency Analysis and Convolutional Neural Network
Moon, Jung Min
Kim, Jun Ho
Kim, Tae Woo
Chun, Chan Jun
Kim, Hong Kook
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 500 - 503
[45] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
Fan, Cunhang
Tao, Jianhua
Bin Liu
Yi, Jiangyan
Wen, Zhengqi
INTERSPEECH 2020, 2020, : 3321 - 3325
[46] MULTI-CHANNEL DEEP CLUSTERING: DISCRIMINATIVE SPECTRAL AND SPATIAL EMBEDDINGS FOR SPEAKER-INDEPENDENT SPEECH SEPARATION
Wang, Zhong-Qiu
Le Roux, Jonathan
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1 - 5
[47] DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement
Saleem, Nasir
Bourouis, Sami
Elmannai, Hela
Algarni, Abeer D.
APPLIED ACOUSTICS, 2024, 224
[48] DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT
Dang, Feng
Chen, Hangting
Zhangt, Pengyuan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6857 - 6861
[49] Comparative Analysis of Multilayer Backpropagation and Multi-Channel Deep Convolutional Neural Network for Human Activity Recognition
Priyadharshini, J. Mary Hanna
Kavitha, S.
Bharathi, B.
RECENT DEVELOPMENTS IN MATHEMATICAL ANALYSIS AND COMPUTING, 2019, 2095
[50] Opacity annotation of diffuse lung diseases using deep convolutional neural network with multi-channel information
Mabu, Shingo
Kido, Shoji
Hashimoto, Noriaki
Hirano, Yasushi
Kuremoto, Takashi
MEDICAL IMAGING 2018: COMPUTER-AIDED DIAGNOSIS, 2018, 10575

← 1 2 3 4 5 →