Robust End-to-end Speaker Diarization with Generic Neural Clustering

被引:1
|
作者
Yang, Chenyu [1 ]
Wang, Yu [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker diarization; end-to-end neural diarization; neural clustering;
D O I
10.21437/Interspeech.2022-10404
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end speaker diarization approaches have shown exceptional performance over the traditional modular approaches. To further improve the performance of the end-to-end speaker diarization for real speech recordings, recently works have been proposed which integrate unsupervised clustering algorithms with the end-to-end neural diarization models. However, these methods have a number of drawbacks: 1) The unsupervised clustering algorithms cannot leverage the supervision from the available datasets; 2) The K-means-based unsupervised algorithms that are explored often suffer from the constraint violation problem; 3) There is unavoidable mismatch between the supervised training and the unsupervised inference. In this paper, a robust generic neural clustering approach is proposed that can be integrated with any chunk-level predictor to accomplish a fully supervised end-to-end speaker diarization model. Also, by leveraging the sequence modelling ability of a recurrent neural network, the proposed neural clustering approach can dynamically estimate the number of speakers during inference. Experimental show that when integrating an attractor-based chunk-level predictor, the proposed neural clustering approach can yield better Diarization Error Rate (DER) than the constrained K-means-based clustering approaches under the mismatched conditions.
引用
收藏
页码:1471 / 1475
页数:5
相关论文
共 50 条
  • [21] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
  • [22] Wavesplit: End-to-End Speech Separation by Speaker Clustering
    Zeghidour, Neil
    Grangier, David
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2840 - 2849
  • [23] EEND-SS: JOINT END-TO-END NEURAL SPEAKER DIARIZATION AND SPEECH SEPARATION FOR FLEXIBLE NUMBER OF SPEAKERS
    Maiti, Soumi
    Ueda, Yushi
    Watanabe, Shinji
    Zhang, Chunlei
    Yu, Meng
    Zhang, Shi-Xiong
    Xu, Yong
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 480 - 487
  • [24] End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors
    Rybicka, Magdalena
    Villalba, Jesus
    Dehak, Najim
    Kowalczyk, Konrad
    [J]. INTERSPEECH 2022, 2022, : 5090 - 5094
  • [25] Adversarial Regularization for End-to-end Robust Speaker Verification
    Wang, Qing
    Guo, Pengcheng
    Sun, Sining
    Xie, Lei
    Hansen, John H. L.
    [J]. INTERSPEECH 2019, 2019, : 4010 - 4014
  • [26] Robust End-to-End Speaker Verification Using EEG
    Han, Yan
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed H.
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1170 - 1174
  • [27] A study on end-to-end speaker diarization system using single-label classification
    Jung, Jaehee
    Kim, Wooil
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2023, 42 (06): : 536 - 543
  • [28] Encoder-Decoder Based Attractors for End-to-End Neural Diarization
    Horiguchi, Shota
    Fujita, Yusuke
    Watanabe, Shinji
    Xue, Yawen
    Garcia, Paola
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1493 - 1507
  • [29] Neural PLDA Modeling for End-to-End Speaker Verification
    Ramoji, Shreyas
    Krishnan, Prashant
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 4333 - 4337
  • [30] MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES
    Horiguchi, Shota
    Takashima, Yuki
    Garcia, Paola
    Watanabe, Shinji
    Kawaguchi, Yohei
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7332 - 7336