Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors

被引:1
|
作者
Chetupalli, Srikanth Raj [1 ]
Habets, Emanuel A. P. [1 ]
机构
[1] Int Audio Labs Erlangen, Wolfsmantel 33, D-91058 Erlangen, Germany
来源
关键词
source separation; speaker counting; attractors; transformers;
D O I
10.21437/Interspeech.2022-10849
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation for single-channel mixtures with an unknown number of multiple speakers in the waveform domain is considered in this paper. To deal with the unknown number of sources, we incorporate an encoder-decoder attractor (EDA) module into a speech separation network. The neural network architecture consists of a trainable encoder-decoder pair and a masking network. The mask network in the proposed approach is inspired by the transformer-based SepFormer separation system. It contains a dual-path block and a triple path block, each block modeling both short-time and long-time dependencies in the signal. The EDA module first summarises the dual-path block output using an LSTM encoder and generates one attractor vector per speaker in the mixture using an LSTM decoder. The attractors are combined with the dual-path block output to generate speaker channels, which are processed jointly by the triple-path block to predict the mask. Further, a linear-sigmoid layer, with attractors as the input, predicts a binary output to indicate a stopping criterion for attractor generation. The proposed approach is evaluated on the WSJ0-mix dataset with mixtures of up to five speakers. State-of-the-art results are obtained in the speech separation quality and speaker counting for all the mixtures.
引用
下载
收藏
页码:5393 / 5397
页数:5
相关论文
共 50 条
  • [31] Ordering Sentences and Paragraphs with Pre-trained Encoder-Decoder Transformers and Pointer Ensembles
    Calizzano, Remi
    Ostendorff, Malte
    Rehm, Georg
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [32] A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition
    Lu, Liang
    Zhang, Xingxing
    Cho, Kyunghyun
    Renals, Steve
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3249 - 3253
  • [33] An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis
    Ahmad, Arif
    Selim, Mohammad Reza
    Iqbal, Muhammed Zafar
    Rahman, Mohammad Shahidur
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2019, 40 (06) : 374 - 381
  • [34] Whole Image Synthesis Using a Deep Encoder-Decoder Network
    Sevetlidis, Vasileios
    Giuffrida, Mario Valerio
    Tsaftaris, Sotirios A.
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, SASHIMI 2016, 2016, 9968 : 127 - 137
  • [35] Unsupervised feature selection using orthogonal encoder-decoder factorization
    Mozafari, Maryam
    Seyedi, Seyed Amjad
    Mohammadiani, Rojiar Pir
    Tab, Fardin Akhlaghian
    INFORMATION SCIENCES, 2024, 663
  • [36] Filling gaps of cartographic polylines by using an encoder-decoder model
    Yu, Wenhao
    Chen, Yujie
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2022, 36 (11) : 2296 - 2321
  • [37] Pedestrian trajectory prediction using BiRNN encoder-decoder framework*
    Wu, Jiaxu
    Woo, Hanwool
    Tamura, Yusuke
    Moro, Alessandro
    Massaroli, Stefano
    Yamashita, Atsushi
    Asama, Hajime
    ADVANCED ROBOTICS, 2019, 33 (18) : 956 - 969
  • [38] SEQUENCE TRAINING OF ENCODER-DECODER MODEL USING POLICY GRADIENT FOR END-TO-END SPEECH RECOGNITION
    Karita, Shigeki
    Ogawa, Atsunori
    Delcroix, Marc
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5839 - 5843
  • [39] Prediction of the morphological evolution of a splashing drop using an encoder-decoder
    Yee, Jingzu
    Igarashi, Daichi
    Miyatake, Shun
    Tagawa, Yoshiyuki
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (02):
  • [40] VISIBLE AND INFRARED IMAGE FUSION USING ENCODER-DECODER NETWORK
    Ataman, Ferhat Can
    Bozdagi Akar, Gozde
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1779 - 1783