Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

被引:115
|
作者
Shum, Stephen H. [1 ]
Dehak, Najim [1 ]
Dehak, Reda [2 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Lab Rech & Dev EPITA, F-94276 Paris, France
关键词
Bayesian nonparametric inference; factor analysis; HDP-HMM; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;
D O I
10.1109/TASL.2013.2264673
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.
引用
收藏
页码:2015 / 2028
页数:14
相关论文
共 50 条
  • [1] On the Use of Spectral and Iterative Methods for Speaker Diarization
    Shum, Stephen
    Dehak, Najim
    Glass, Jim
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 482 - 485
  • [2] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [3] Iterative PLDA Adaptation for Speaker Diarization
    Le Lan, Gael
    Charlet, Delphine
    Larcher, Anthony
    Meignier, Sylvain
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
  • [4] Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
    Xylogiannis, Paris
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Dimoulas, Charalampos
    [J]. SENSORS, 2024, 24 (13)
  • [5] Unsupervised deep feature embeddings for speaker diarization
    Ahmad, Rehan
    Zubair, Syed
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (04) : 3138 - 3149
  • [6] Unsupervised Speaker Diarization Using Riemannian Manifold Clustering
    Huang, Che-Wei
    Xiao, Bo
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 567 - 571
  • [7] Exploring methods of improving speaker accuracy for speaker diarization
    Knox, Mary Tai
    Mirghafori, Nikki
    Friedland, Gerald
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786
  • [8] A Hybrid Approach to Online Speaker Diarization
    Vaquero, Carlos
    Vinyals, Oriol
    Friedland, Gerald
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2646 - +
  • [9] Spectral Clustering Approach to Speaker Diarization
    Ning, Huazhong
    Liu, Ming
    Tang, Hao
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
  • [10] An Integrated Top-Down/Bottom-Up Approach To Speaker Diarization
    Bozonnet, Simon
    Evans, Nicholas
    Fredouille, Corinne
    Wang, Dong
    Troncy, Raphael
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2654 - +