SPEAKER DIARIZATION WITH SESSION-LEVEL SPEAKER EMBEDDING REFINEMENT USING GRAPH NEURAL NETWORKS

被引:0
|
作者
Wang, Jixuan [1 ,2 ]
Xiao, Xiong [3 ]
Wu, Jian [3 ]
Ramamurthy, Ranjani [3 ]
Rudzicz, Frank [1 ,2 ]
Brudno, Michael [1 ,2 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] Microsoft, Redmond, WA USA
关键词
Speaker diarization; graph neural networks; deep speaker embedding;
D O I
10.1109/icassp40776.2020.9054176
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the ground-truth adjacency matrix. Spectral clustering is then applied on top of the refined embeddings. We show that the clustering performance of the refined speaker embeddings outperforms the original embeddings significantly on both simulated and real meeting data, and our system achieves the state-of-the-art result on the NIST SRE 2000 CALLHOME database.
引用
收藏
页码:7109 / 7113
页数:5
相关论文
共 50 条
  • [1] Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks
    Wang, Jixuan
    Xiao, Xiong
    Wu, Jian
    Ramamurthy, Ranjani
    Rudzicz, Frank
    Brudno, Michael
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2020, 2020-May : 7109 - 7113
  • [2] Speaker diarization using autoassociative neural networks
    Jothilakshmi, S.
    Ramalingam, V.
    Palanivel, S.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2009, 22 (4-5) : 667 - 675
  • [3] Speaker Diarization using Embedding Vectors
    Toruk, Mesut
    Bilgin, Gokhan
    Serbes, Ahmet
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [4] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [5] Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
    Zajic, Zbynek
    Hruz, Marek
    Mueller, Ladek
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3562 - 3566
  • [6] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
    Zhou, Zhihan
    Zhang, Yichi
    Duan, Zhiyao
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
  • [7] Online Neural Speaker Diarization With Target Speaker Tracking
    Wang, Weiqing
    Li, Ming
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 5078 - 5091
  • [8] A Modified Approach to Cluster Refinement for Speaker Diarization
    Zhu, Liping
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 1457 - 1460
  • [9] Speaker Diarization Based on Locally Linear Embedding
    Shahar, Ori
    Twito, Lee
    Spingarn, Nurit
    Cohen, Israel
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [10] ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding
    He, Mao-Kui
    Du, Jun
    Liu, Qing-Feng
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1561 - 1573