DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引:0
|
作者
Wang, Zhenyu [1 ,2 ,4 ]
Zhou, Yi [1 ,2 ]
Gan, Lu [3 ,4 ]
Chen, Rilin
Tang, Xinyu [1 ,2 ]
Liu, Hongqing [1 ,2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China
[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England
[4] Tencent AI Lab, Beijing, Peoples R China
关键词
Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;
D O I
10.1109/SIPS55645.2022.9919247
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 50 条
  • [31] QDPN - Quasi-dual-path Network for single-channel Speech Separation
    Rixen, Joel
    Renz, Matthias
    INTERSPEECH 2022, 2022, : 5353 - 5357
  • [32] DON'T SHOOT BUTTERFLY WITH RIFLES: MULTI-CHANNEL CONTINUOUS SPEECH SEPARATION WITH EARLY EXIT TRANSFORMER
    Chen, Sanyuan
    Wu, Yu
    Chen, Zhuo
    Yoshioka, Takuya
    Liu, Shujie
    Li, Jinyu
    Yu, Xiangzhan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6139 - 6143
  • [33] DUAL-PATH RNN: EFFICIENT LONG SEQUENCE MODELING FOR TIME-DOMAIN SINGLE-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Ghen, Zhuo
    Yoshioka, Takuya
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 46 - 50
  • [34] DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
    Yang, Xiaokang
    Wei, Jianguo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4363 - 4367
  • [35] Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters
    Tesch, Kristina
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 542 - 553
  • [36] Dual-Input and Multi-Channel Convolutional Neural Network Model for Vehicle Speed Prediction
    Xing, Jiaming
    Chu, Liang
    Guo, Chong
    Pu, Shilin
    Hou, Zhuoran
    SENSORS, 2021, 21 (22)
  • [37] LEAF CLASSIFICATION USING MARGINALIZED SHAPE CONTEXT AND SHAPE plus TEXTURE DUAL-PATH DEEP CONVOLUTIONAL NEURAL NETWORK
    Shah, Meet P.
    Singha, Sougata
    Awate, Suyash P.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 860 - 864
  • [38] DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model
    Li, Zongren
    Silamu, Wushouer
    Ma, Yajing
    Li, Yanbing
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [39] Time-frequency Domain Filter-and-sum Network for Multi-channel Speech Separation
    Deng, Zhewen
    Zhou, Yi
    Liu, Hongqing
    INTERSPEECH 2023, 2023, : 3689 - 3693
  • [40] A New Multi-Channel Deep Convolutional Neural Network for Semantic Segmentation of Remote Sensing Image
    Liu, Wenjie
    Zhang, Yongjun
    Fan, Haisheng
    Zou, Yongjie
    Cui, Zhongwei
    IEEE ACCESS, 2020, 8 : 131814 - 131825