SELF-SUPERVISED METRIC LEARNING WITH GRAPH CLUSTERING FOR SPEAKER DIARIZATION

被引:5
|
作者
Singh, Prachi [1 ]
Ganapathy, Sriram [1 ]
机构
[1] Indian Inst Sci, Learning & Extract Acoust Patterns LEAP Lab, Elect Engn, Bangalore, Karnataka, India
关键词
Speaker diarization; x-vectors; path integral clustering; neural PLDA; self-supervised learning;
D O I
10.1109/ASRU51503.2021.9688271
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel algorithm for speaker diarization using metric learning for graph based clustering. The graph clustering algorithms use an adjacency matrix consisting of similarity scores. These scores are computed between speaker embeddings extracted from pairs of audio segments within the given recording. In this paper, we propose an approach that jointly learns the speaker embeddings and the similarity metric using principles of self-supervised learning. The metric learning network implements a neural model of the probabilistic linear discriminant analysis (PLDA). The self-supervision is derived from the pseudo labels obtained from a previous iteration of clustering. The entire model of representation learning and metric learning is trained with a binary cross entropy loss. By combining the self-supervision based metric learning along with the graph-based clustering algorithm, we achieve significant relative improvements of 60% and 7% over the x-vector PLDA agglomerative hierarchical clustering (AHC) approach on AMI and the DIHARD datasets respectively in terms of diarization error rates (DER).
引用
收藏
页码:90 / 97
页数:8
相关论文
共 50 条
  • [1] Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1639 - 1649
  • [2] Deep Self-Supervised Hierarchical Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 294 - 298
  • [3] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [4] Self-supervised Speaker Diarization
    Dissen, Yehoshua
    Kreuk, Felix
    Keshet, Joseph
    [J]. INTERSPEECH 2022, 2022, : 4013 - 4017
  • [5] SELF-SUPERVISED LEARNING FOR AUDIO-VISUAL SPEAKER DIARIZATION
    Ding, Yifan
    Xu, Yong
    Zhang, Shi-Xiong
    Cong, Yahuan
    Wang, Liqiang
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4367 - 4371
  • [6] A self-supervised learning model for graph clustering optimization problems
    Cai, Qingqiong
    Guo, Xingyue
    Huang, Shenwei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 290
  • [7] Self-Supervised Clustering based on Manifold Learning and Graph Convolutional Networks
    Lopes, Leonardo Tadeu
    Guimaraes Pedronette, Daniel Carlos
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5623 - 5632
  • [8] Redundancy-Free Self-Supervised Relational Learning for Graph Clustering
    Yi, Siyu
    Ju, Wei
    Qin, Yifang
    Luo, Xiao
    Liu, Luchen
    Zhou, Yongdao
    Zhang, Ming
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [9] CONTINUAL SELF-SUPERVISED DOMAIN ADAPTATION FOR END-TO-END SPEAKER DIARIZATION
    Coria, Juan M.
    Bredin, Herve
    Ghannay, Sahar
    Rosset, Sophie
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 626 - 632
  • [10] Fast Self-Supervised Clustering With Anchor Graph
    Wang, Jingyu
    Ma, Zhenyu
    Nie, Feiping
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4199 - 4212