Self-supervised Speaker Diarization

被引:0
|
作者
Dissen, Yehoshua [1 ]
Kreuk, Felix [2 ]
Keshet, Joseph [1 ]
机构
[1] Technion Israel Inst Technol, Fac Elect & Comp Engn, Haifa, Israel
[2] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
来源
关键词
speaker diarization; self-supervised training; unsupervised PLDA;
D O I
10.21437/Interspeech.2022-777
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised deep-learning model for speaker diarization. Specifically, the study focuses on generating high-quality neural speaker representations without any annotated data, as well as on estimating secondary hyperparameters of the model without annotations. The speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker. The trained encoder model is then used to self-generate pseudo-labels to subsequently train a similarity score between different segments of the same call using probabilistic linear discriminant analysis (PLDA) and further to learn a clustering stopping threshold. We compared our model to state-of-the-art unsupervised as well as supervised baselines on the CallHome benchmarks. According to empirical results, our approach outperforms unsupervised methods when only two speakers are present in the call, and is only slightly worse than recent supervised models.
引用
收藏
页码:4013 / 4017
页数:5
相关论文
共 50 条
  • [1] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [2] Deep Self-Supervised Hierarchical Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 294 - 298
  • [3] SELF-SUPERVISED LEARNING FOR AUDIO-VISUAL SPEAKER DIARIZATION
    Ding, Yifan
    Xu, Yong
    Zhang, Shi-Xiong
    Cong, Yahuan
    Wang, Liqiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4367 - 4371
  • [4] SELF-SUPERVISED METRIC LEARNING WITH GRAPH CLUSTERING FOR SPEAKER DIARIZATION
    Singh, Prachi
    Ganapathy, Sriram
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 90 - 97
  • [5] Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1639 - 1649
  • [6] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
  • [7] Self-supervised speaker embeddings
    Stafylakis, Themos
    Rohdin, Johan
    Plchot, Oldrich
    Mizera, Petr
    Burget, Lukas
    [J]. INTERSPEECH 2019, 2019, : 2863 - 2867
  • [8] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [9] FULLY SUPERVISED SPEAKER DIARIZATION
    Zhang, Aonan
    Wang, Quan
    Zhu, Zhenyao
    Paisley, John
    Wang, Chong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6301 - 6305
  • [10] Implicit Self-Supervised Language Representation for Spoken Language Diarization
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3393 - 3407