Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

被引:115
|
作者
Shum, Stephen H. [1 ]
Dehak, Najim [1 ]
Dehak, Reda [2 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Lab Rech & Dev EPITA, F-94276 Paris, France
关键词
Bayesian nonparametric inference; factor analysis; HDP-HMM; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;
D O I
10.1109/TASL.2013.2264673
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.
引用
收藏
页码:2015 / 2028
页数:14
相关论文
共 50 条
  • [31] An Analysis of Speaker Diarization Fusion Methods For The First DIHARD Challenge
    Yin, Bing
    Du, Jun
    Sun, Lei
    Zhang, Xueyang
    He, Shan
    Ling, Zhenhua
    Hu, Guoping
    Guo, Wu
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1473 - 1477
  • [32] SPEAKER DIARIZATION USING UNSUPERVISED DISCRIMINANT ANALYSIS OF INTER-CHANNEL DELAY FEATURES
    Evans, Nicholas W. D.
    Fredouille, Corinne
    Bonastre, Jean-Francois
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4061 - +
  • [33] New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis
    Martinez-Gonzalez, Beatriz
    Manuel Pardo, Jose
    Echeverry-Correa, J. D.
    Montero, J. M.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2014, (52): : 77 - 84
  • [34] Step-by-step and integrated approaches in broadcast news speaker diarization
    Meignier, S
    Moraru, D
    Fredouille, C
    Bonastre, JF
    Besacier, L
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3): : 303 - 330
  • [35] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
    VijayKumar, K.
    Rao, R. Rajeswara
    [J]. DATA & KNOWLEDGE ENGINEERING, 2023, 144
  • [36] An Iterative Speaker Re-Diarization Scheme for Improving Speaker-Based Entity Extraction in Multimedia Archives
    Ghaemmaghami, Houman
    Dean, David
    Sridharan, Sridha
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 577 - 581
  • [37] A Multimodal Approach to Speaker Diarization on TV Talk-Shows
    Vallet, Felicien
    Essid, Slim
    Carrive, Jean
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (03) : 509 - 520
  • [38] Probability Mass Function Distance Approach for Fast Speaker Diarization
    Arslan, Levent
    Sarar, Gokce
    Demirbag, Sedat
    Erden, Mustafa
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [39] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
  • [40] A simple approach to unsupervised speaker indexing
    Ofoegbu, Uchechukwu O.
    Iyer, Ananth N.
    Yantorno, Robert E.
    Smolenski, Brett Y.
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 315 - 318