Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors

被引:1
|
作者
Chetupalli, Srikanth Raj [1 ]
Habets, Emanuel A. P. [1 ]
机构
[1] Int Audio Labs Erlangen, Wolfsmantel 33, D-91058 Erlangen, Germany
来源
关键词
source separation; speaker counting; attractors; transformers;
D O I
10.21437/Interspeech.2022-10849
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker-independent speech separation for single-channel mixtures with an unknown number of multiple speakers in the waveform domain is considered in this paper. To deal with the unknown number of sources, we incorporate an encoder-decoder attractor (EDA) module into a speech separation network. The neural network architecture consists of a trainable encoder-decoder pair and a masking network. The mask network in the proposed approach is inspired by the transformer-based SepFormer separation system. It contains a dual-path block and a triple path block, each block modeling both short-time and long-time dependencies in the signal. The EDA module first summarises the dual-path block output using an LSTM encoder and generates one attractor vector per speaker in the mixture using an LSTM decoder. The attractors are combined with the dual-path block output to generate speaker channels, which are processed jointly by the triple-path block to predict the mask. Further, a linear-sigmoid layer, with attractors as the input, predicts a binary output to indicate a stopping criterion for attractor generation. The proposed approach is evaluated on the WSJ0-mix dataset with mixtures of up to five speakers. State-of-the-art results are obtained in the speech separation quality and speaker counting for all the mixtures.
引用
收藏
页码:5393 / 5397
页数:5
相关论文
共 50 条
  • [1] End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Nagamatsu, Kenji
    [J]. INTERSPEECH 2020, 2020, : 269 - 273
  • [2] Recursive speech separation for unknown number of speakers
    Takahashi, Naoya
    Parthasaarathy, Sudarsanam
    Goswami, Nabarun
    Mitsufuji, Yuki
    [J]. INTERSPEECH 2019, 2019, : 1348 - 1352
  • [3] Distillation of encoder-decoder transformers for sequence labelling
    Farina, Marco
    Pappadopulo, Duccio
    Gupta, Anant
    Huang, Leslie
    Irsoy, Ozan
    Solorio, Thamar
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2539 - 2549
  • [4] AN EFFICIENT ENCODER-DECODER ARCHITECTURE WITH TOP-DOWN ATTENTION FOR SPEECH SEPARATION
    Li, Kai
    Yang, Runxuan
    Hu, Xiaolin
    [J]. 11th International Conference on Learning Representations, ICLR 2023, 2023,
  • [5] AN EFFICIENT ENCODER-DECODER ARCHITECTURE WITH TOP-DOWN ATTENTION FOR SPEECH SEPARATION
    Li, Kai
    Yang, Runxuan
    Hu, Xiaolin
    [J]. arXiv, 2022,
  • [6] Improved speech enhancement using TCN with multiple encoder-decoder layers
    Kishore, Vinith
    Tiwari, Nitya
    Paramasivam, Periyasamy
    [J]. INTERSPEECH 2020, 2020, : 4531 - 4535
  • [7] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
    Parekh, Jayneel
    Rao, Preeti
    Yang, Yi-Hsuan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
  • [8] Confidence measures in encoder-decoder models for speech recognition
    Woodward, Alejandro
    Bonnin, Clara
    Masuda, Issey
    Varas, David
    Bou-Balust, Elisenda
    Riveiro, Juan Carlos
    [J]. INTERSPEECH 2020, 2020, : 611 - 615
  • [9] Encoder-Decoder Based Attractors for End-to-End Neural Diarization
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Garcia, Paola
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1493 - 1507
  • [10] Multi-layer encoder-decoder time-domain single channel speech separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Yi, Chen
    Wei, Ying
    [J]. PATTERN RECOGNITION LETTERS, 2024, 181 : 86 - 91