Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors

被引:1
|
作者
Chetupalli, Srikanth Raj [1 ]
Habets, Emanuel A. P. [1 ]
机构
[1] Int Audio Labs Erlangen, Wolfsmantel 33, D-91058 Erlangen, Germany
来源
关键词
source separation; speaker counting; attractors; transformers;
D O I
10.21437/Interspeech.2022-10849
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation for single-channel mixtures with an unknown number of multiple speakers in the waveform domain is considered in this paper. To deal with the unknown number of sources, we incorporate an encoder-decoder attractor (EDA) module into a speech separation network. The neural network architecture consists of a trainable encoder-decoder pair and a masking network. The mask network in the proposed approach is inspired by the transformer-based SepFormer separation system. It contains a dual-path block and a triple path block, each block modeling both short-time and long-time dependencies in the signal. The EDA module first summarises the dual-path block output using an LSTM encoder and generates one attractor vector per speaker in the mixture using an LSTM decoder. The attractors are combined with the dual-path block output to generate speaker channels, which are processed jointly by the triple-path block to predict the mask. Further, a linear-sigmoid layer, with attractors as the input, predicts a binary output to indicate a stopping criterion for attractor generation. The proposed approach is evaluated on the WSJ0-mix dataset with mixtures of up to five speakers. State-of-the-art results are obtained in the speech separation quality and speaker counting for all the mixtures.
引用
下载
收藏
页码:5393 / 5397
页数:5
相关论文
共 50 条
  • [21] FrameAugment: A Simple Data Augmentation Method for Encoder-Decoder Speech Recognition
    Lim, Seong-Su
    Kwon, Oh-Wook
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [22] Using Convolutional Encoder-Decoder for Document Image Binarization
    Peng, Xujun
    Cao, Huaigu
    Natarajan, Prem
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 708 - 713
  • [23] Using LSTM encoder-decoder for rhetorical structure prediction
    de Moura, Gustavo Bennemann
    Feltrim, Valeria Delisandra
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 278 - 283
  • [24] Unsupervised Feature Selection using Encoder-Decoder Networks
    SharifiPour, Sasan
    Fayyazi, Hossein
    Sabokro, Mohammad
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [25] Table Structure Recognition Using CoDec Encoder-Decoder
    Pegu, Bhanupriya
    Singh, Maneet
    Agarwal, Aakash
    Mitra, Aniruddha
    Singh, Karamjit
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 66 - 80
  • [26] Alpha matting for portraits using encoder-decoder models
    Akshat Srivastava
    Srivatsav Raghu
    Abitha K Thyagarajan
    Jayasri Vaidyaraman
    Mohanaprasad Kothandaraman
    Pavan Sudheendra
    Avinav Goel
    Multimedia Tools and Applications, 2022, 81 : 14517 - 14528
  • [27] Image Segmentation Using Encoder-Decoder with Deformable Convolutions
    Gurita, Andreea
    Mocanu, Irina Georgiana
    SENSORS, 2021, 21 (05) : 1 - 27
  • [28] Alpha matting for portraits using encoder-decoder models
    Srivastava, Akshat
    Raghu, Srivatsav
    Thyagarajan, Abitha K.
    Vaidyaraman, Jayasri
    Kothandaraman, Mohanaprasad
    Sudheendra, Pavan
    Goel, Avinav
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14517 - 14528
  • [29] Semantic road segmentation using encoder-decoder architectures
    Burhanuddin Latsaheb
    Sanjeev Sharma
    Sanskar Hasija
    Multimedia Tools and Applications, 2025, 84 (9) : 5961 - 5983
  • [30] Time frequency masking based speech enhancement using deep encoder-decoder neural network
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Sun, Meng
    Li, Li
    Shengxue Xuebao/Acta Acustica, 2020, 45 (03): : 299 - 307