DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation

被引:0
|
作者
Wang, Zhenyu [1 ,2 ,4 ]
Zhou, Yi [1 ,2 ]
Gan, Lu [3 ,4 ]
Chen, Rilin
Tang, Xinyu [1 ,2 ]
Liu, Hongqing [1 ,2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China
[2] Chongqing Key Lab Signal & Informat Proc, Chongqing 400065, Peoples R China
[3] Brunel Univ, Coll Engn Design & Phys Sci, London UB8 3PH, England
[4] Tencent AI Lab, Beijing, Peoples R China
关键词
Speech separation; multi-channel; deep encoder; improved transformer; beamforming; TASNET;
D O I
10.1109/SIPS55645.2022.9919247
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, beamforming has been extensively investigated in multi-channel speech separation task. In this paper, we propose a deep encoder dual-path convolutional transformer network (DE-DPCTnet), which directly estimates the beamforming filters for speech separation task in time domain. In order to learn the signal repetitions correctly, nonlinear deep encoder module is proposed to replace the traditional linear one. The improved transformer is also developed by utilizing convolutions to capture long-time speech sequences. The ablation studies demonstrate that the deep encoder and improved transformer indeed benefit the separation performance. The comparisons show that the DE-DPCTnet outperforms the state-of-the-art filter-and-sum network with transform-average-concatenate module (FaSNet-TAC), even with a lower computational complexity.
引用
收藏
页码:180 / 184
页数:5
相关论文
共 50 条
  • [41] Implicit Filter-and-sum Network for End-to-end Multi-channel Speech Separation
    Luo, Yi
    Mesgarani, Nima
    INTERSPEECH 2021, 2021, : 3071 - 3075
  • [42] Multi-channel Image Registration of Cardiac MR Using Supervised Feature Learning with Convolutional Encoder-Decoder Network
    Lu, Xuesong
    Qiao, Yuchuan
    BIOMEDICAL IMAGE REGISTRATION (WBIR 2020), 2020, 12120 : 103 - 110
  • [43] Multi-Channel EEG Emotion Recognition Based on Parallel Transformer and 3D-Convolutional Neural Network
    Sun, Jie
    Wang, Xuan
    Zhao, Kun
    Hao, Siyuan
    Wang, Tianyu
    MATHEMATICS, 2022, 10 (17)
  • [44] Multi-Channel Audio Source Separation Using Azimuth-Frequency Analysis and Convolutional Neural Network
    Moon, Jung Min
    Kim, Jun Ho
    Kim, Tae Woo
    Chun, Chan Jun
    Kim, Hong Kook
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 500 - 503
  • [45] Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations
    Fan, Cunhang
    Tao, Jianhua
    Bin Liu
    Yi, Jiangyan
    Wen, Zhengqi
    INTERSPEECH 2020, 2020, : 3321 - 3325
  • [46] MULTI-CHANNEL DEEP CLUSTERING: DISCRIMINATIVE SPECTRAL AND SPATIAL EMBEDDINGS FOR SPEAKER-INDEPENDENT SPEECH SEPARATION
    Wang, Zhong-Qiu
    Le Roux, Jonathan
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1 - 5
  • [47] DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement
    Saleem, Nasir
    Bourouis, Sami
    Elmannai, Hela
    Algarni, Abeer D.
    APPLIED ACOUSTICS, 2024, 224
  • [48] DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT
    Dang, Feng
    Chen, Hangting
    Zhangt, Pengyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6857 - 6861
  • [49] Comparative Analysis of Multilayer Backpropagation and Multi-Channel Deep Convolutional Neural Network for Human Activity Recognition
    Priyadharshini, J. Mary Hanna
    Kavitha, S.
    Bharathi, B.
    RECENT DEVELOPMENTS IN MATHEMATICAL ANALYSIS AND COMPUTING, 2019, 2095
  • [50] Opacity annotation of diffuse lung diseases using deep convolutional neural network with multi-channel information
    Mabu, Shingo
    Kido, Shoji
    Hashimoto, Noriaki
    Hirano, Yasushi
    Kuremoto, Takashi
    MEDICAL IMAGING 2018: COMPUTER-AIDED DIAGNOSIS, 2018, 10575