Bayesian HMM based x-vector clustering for Speaker Diarization

被引:29
|
作者
Diez, Mireia [1 ]
Burget, Lukas [1 ]
Wang, Shuai [1 ,2 ]
Rohdin, Johan [1 ]
Cernocky, Jan [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, IT4I Ctr Excellence, Brno, Czech Republic
[2] Shanghai Jiao Tong Univ, Speechlab, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Speaker Diarization; Variational Bayes; HMM; x-vector; DIHARD;
D O I
10.21437/Interspeech.2019-2813
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents a simplified version of the previously proposed diarization algorithm based on Bayesian Hidden Markov Models, which uses Variational Bayesian inference for very fast and robust clustering of x-vector (neural network based speaker embeddings). The presented results show that this clustering algorithm provides significant improvements in diarization performance as compared to the previously used Agglomerative Hierarchical Clustering. The output of this system can be further employed as an initialization for a second stage VB diarization system, using frame-wise MFCC features as input, to obtain optimal results.
引用
收藏
页码:346 / 350
页数:5
相关论文
共 50 条
  • [21] Spectral Clustering Approach to Speaker Diarization
    Ning, Huazhong
    Liu, Ming
    Tang, Hao
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
  • [22] PLDA-based Clustering for Speaker Diarization of Broadcast Streams
    Silovsky, Jan
    Prazak, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2920 - +
  • [23] LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
    Lin, Qingjian
    Yin, Ruiqing
    Li, Ming
    Bredin, Herve
    Barras, Claude
    [J]. INTERSPEECH 2019, 2019, : 366 - 370
  • [24] X-vector DNN Refinement with Full-length Recordings for Speaker Recognition
    Garcia-Romero, Daniel
    Snyder, David
    Sell, Gregory
    McCree, Alan
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2019, 2019, : 1493 - 1496
  • [25] Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings
    Luque, J.
    Segura, C.
    Hernando, J.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 383 - 386
  • [26] COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION
    Madikeri, Srikanth
    Motlicek, Petr
    Bourlard, Herve
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4834 - 4838
  • [27] Target speaker recovery and recognition network with average x-vector and global training
    Li, Wenjie
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 3233 - 3237
  • [28] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
    Zibert, Janez
    Mihelic, France
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
  • [29] Discriminative Training for Hierarchical Clustering in Speaker Diarization
    Vinyals, Oriol
    Friedland, Gerald
    Morgan, Nelson
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2326 - +
  • [30] BAYESIAN ANALYSIS OF SIMILARITY MATRICES FOR SPEAKER DIARIZATION
    Sholokhov, Alexey
    Pekhovsky, Timur
    Kudashev, Oleg
    Shulipa, Andrei
    Kinnunen, Tomi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,