Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

被引:115
|
作者
Shum, Stephen H. [1 ]
Dehak, Najim [1 ]
Dehak, Reda [2 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Lab Rech & Dev EPITA, F-94276 Paris, France
关键词
Bayesian nonparametric inference; factor analysis; HDP-HMM; i-vectors; principal component analysis; speaker clustering; speaker diarization; spectral clustering; variational Bayes;
D O I
10.1109/TASL.2013.2264673
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation steps, demonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.
引用
收藏
页码:2015 / 2028
页数:14
相关论文
共 50 条
  • [21] An Information Theoretic Approach to Speaker Diarization of Meeting Data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1382 - 1393
  • [22] A Hybrid Generative-Discriminative Approach to Speaker Diarization
    Noulas, Athanasios K.
    van Kasteren, Tim
    Kroese, Ben J. A.
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 98 - 109
  • [23] COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION
    Madikeri, Srikanth
    Motlicek, Petr
    Bourlard, Herve
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4834 - 4838
  • [24] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [25] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [26] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [27] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [28] Randomization Effect on Iterative-Based Speaker Diarization System for Telephone Conversations
    Furmanov, Tal
    Aminov, Lidiya
    Moyal, Ami
    Lapidot, Itshak
    [J]. 2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [29] DIVE: END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING
    Zeghidour, Neil
    Teboul, Olivier
    Grangier, David
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 702 - 709
  • [30] End-to-end neural speaker diarization with an iterative adaptive attractor estimation
    Hao, Fengyuan
    Li, Xiaodong
    Zheng, Chengshi
    [J]. NEURAL NETWORKS, 2023, 166 : 566 - 578