Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

被引：115

作者：

Shum, Stephen H. ^{[1
]}

Dehak, Najim ^{[1
]}

Dehak, Reda ^{[2
]}

Glass, James R. ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Lab Rech & Dev EPITA, F-94276 Paris, France

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 10期

关键词：

Bayesian nonparametric inference; factor analysis; HDP-HMM; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;

D O I：

10.1109/TASL.2013.2264673

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

引用

页码：2015 / 2028

页数：14

共 50 条

[1] On the Use of Spectral and Iterative Methods for Speaker Diarization
Shum, Stephen
Dehak, Najim
Glass, Jim
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 482 - 485
[2] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
Le Lan, Gael
Meignier, Sylvain
Charlet, Delphine
Deleglise, Paul
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
[3] Iterative PLDA Adaptation for Speaker Diarization
Le Lan, Gael
Charlet, Delphine
Larcher, Anthony
Meignier, Sylvain
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
[4] Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
Xylogiannis, Paris
Vryzas, Nikolaos
Vrysis, Lazaros
Dimoulas, Charalampos
[J]. SENSORS, 2024, 24 (13)
[5] Unsupervised deep feature embeddings for speaker diarization
Ahmad, Rehan
Zubair, Syed
[J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (04) : 3138 - 3149
[6] Unsupervised Speaker Diarization Using Riemannian Manifold Clustering
Huang, Che-Wei
Xiao, Bo
Georgiou, Panayiotis G.
Narayanan, Shrikanth S.
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 567 - 571
[7] Exploring methods of improving speaker accuracy for speaker diarization
Knox, Mary Tai
Mirghafori, Nikki
Friedland, Gerald
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786
[8] A Hybrid Approach to Online Speaker Diarization
Vaquero, Carlos
Vinyals, Oriol
Friedland, Gerald
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2646 - +
[9] Spectral Clustering Approach to Speaker Diarization
Ning, Huazhong
Liu, Ming
Tang, Hao
Huang, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
[10] An Integrated Top-Down/Bottom-Up Approach To Speaker Diarization
Bozonnet, Simon
Evans, Nicholas
Fredouille, Corinne
Wang, Dong
Troncy, Raphael
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2654 - +

← 1 2 3 4 5 →