Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

被引：115

作者：

Shum, Stephen H. ^{[1
]}

Dehak, Najim ^{[1
]}

Dehak, Reda ^{[2
]}

Glass, James R. ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Lab Rech & Dev EPITA, F-94276 Paris, France

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 10期

关键词：

Bayesian nonparametric inference; factor analysis; HDP-HMM; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;

D O I：

10.1109/TASL.2013.2264673

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.

引用

页码：2015 / 2028

页数：14

共 50 条

[31] An Analysis of Speaker Diarization Fusion Methods For The First DIHARD Challenge
Yin, Bing
Du, Jun
Sun, Lei
Zhang, Xueyang
He, Shan
Ling, Zhenhua
Hu, Guoping
Guo, Wu
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1473 - 1477
[32] SPEAKER DIARIZATION USING UNSUPERVISED DISCRIMINANT ANALYSIS OF INTER-CHANNEL DELAY FEATURES
Evans, Nicholas W. D.
Fredouille, Corinne
Bonastre, Jean-Francois
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4061 - +
[33] New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis
Martinez-Gonzalez, Beatriz
Manuel Pardo, Jose
Echeverry-Correa, J. D.
Montero, J. M.
[J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2014, (52): : 77 - 84
[34] Step-by-step and integrated approaches in broadcast news speaker diarization
Meignier, S
Moraru, D
Fredouille, C
Bonastre, JF
Besacier, L
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3): : 303 - 330
[35] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
VijayKumar, K.
Rao, R. Rajeswara
[J]. DATA & KNOWLEDGE ENGINEERING, 2023, 144
[36] An Iterative Speaker Re-Diarization Scheme for Improving Speaker-Based Entity Extraction in Multimedia Archives
Ghaemmaghami, Houman
Dean, David
Sridharan, Sridha
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 577 - 581
[37] A Multimodal Approach to Speaker Diarization on TV Talk-Shows
Vallet, Felicien
Essid, Slim
Carrive, Jean
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (03) : 509 - 520
[38] Probability Mass Function Distance Approach for Fast Speaker Diarization
Arslan, Levent
Sarar, Gokce
Demirbag, Sedat
Erden, Mustafa
[J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[39] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Pang, Bowen
Zhao, Huan
Zhang, Gaosheng
Yang, Xiaoyue
Sun, Yang
Zhang, Li
Wang, Qing
Xie, Lei
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
[40] A simple approach to unsupervised speaker indexing
Ofoegbu, Uchechukwu O.
Iyer, Ananth N.
Yantorno, Robert E.
Smolenski, Brett Y.
[J]. 2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 315 - 318

← 1 2 3 4 5 →