DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引:0
|
作者
Wang, Zhenyu [1 ,2 ,4 ]
Zhou, Yi [1 ,2 ]
Gan, Lu [3 ,4 ]
Chen, Rilin
Tang, Xinyu [1 ,2 ]
Liu, Hongqing [1 ,2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China
[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England
[4] Tencent AI Lab, Beijing, Peoples R China
关键词
Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;
D O I
10.1109/SIPS55645.2022.9919247
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 50 条
  • [21] Hybrid dual-path network: Singing voice separation in the waveform domain by combining Conformer and Transformer architectures
    Wang, Chunxi
    Jia, Maoshen
    Li, Meiran
    Ma, Yong
    Yao, Dingding
    SPEECH COMMUNICATION, 2025, 168
  • [22] A multi-channel deep convolutional neural network for multi-classifying thyroid diseases
    Zhang, Xinyu
    Lee, Vincent C. S.
    Rong, Jia
    Lee, James C.
    Song, Jiangning
    Liu, Feng
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 148
  • [23] Multi-channel Speech Separation Using Deep Embedding With Multilayer Bootstrap Networks
    Yang, Ziye
    Zhang, Xiao-Lei
    Fu, Zhonghua
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 716 - 719
  • [24] Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
    Ye, Moujia
    Wan, Hongjie
    ENTROPY, 2023, 25 (02)
  • [25] Epileptic Seizure Detection for Multi-channel EEG with Deep Convolutional Neural Network
    Park, Chulkyun
    Choi, Gwangho
    Kim, Junkyung
    Kim, Sangdeok
    Kim, Tae-Loon
    Min, Kyeongyuk
    Jung, Ki-Young
    Chong, Jongwha
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 518 - 522
  • [26] Bearing Fault Diagnosis with Multi-Channel Sample and Deep Convolutional Neural Network
    Zhang H.
    Yuan Q.
    Zhao B.
    Niu G.
    Yuan, Qi, 1600, Xi'an Jiaotong University (54): : 58 - 66
  • [27] Multi-Head Attention Time Domain Audiovisual Speech Separation Based on Dual-Path Recurrent Network and Conv-TasNet
    Lan C.
    Jiang P.
    Chen H.
    Zhao S.
    Guo X.
    Han Y.
    Han C.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (03): : 1005 - 1012
  • [28] Retinal vessel segmentation method based on multi-scale dual-path convolutional neural network
    Fang, Tao
    Fang, Linling
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2024, 13 (02)
  • [29] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
    Togami, Masahito
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036
  • [30] MULTI-SCALE TEMPORAL FREQUENCY CONVOLUTIONAL NETWORK WITH AXIAL ATTENTION FOR MULTI-CHANNEL SPEECH ENHANCEMENT
    Zhang, Guochang
    Wang, Chunliang
    Yu, Libiao
    Wei, Jianqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9206 - 9210