Discriminative Training for Hierarchical Clustering in Speaker Diarization

被引:0
|
作者
Vinyals, Oriol [1 ,2 ]
Friedland, Gerald [2 ]
Morgan, Nelson [1 ,2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Int Comp Sci Inst, Berkeley, CA USA
基金
瑞士国家科学基金会;
关键词
Discriminative learning; Maximum Mutual Information; Speaker Diarization;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a discriminative extension to agglomerative hierarchical clustering, a typical technique for speaker diarization, that fits seamlessly with most state-of-the art diarization algorithms. We propose to use maximum mutual information using bootstrapping i.e., initial predictions are used as input for retraining of models in an unsupervised fashion. This article describes this new approach, analyzes its behavior, and presents results on the official NIST Rich Transcription datasets. We show an absolute improvement of 4 % DER with respect to the generative approach baseline. We also observe a strong correlation between the original error and the amount of improvement, that is, the better our predicted labels are, the more gain we obtain from discriminative training, which we interpret as a strong indication for the high potential of the extension.
引用
收藏
页码:2326 / +
页数:2
相关论文
共 50 条
  • [1] Interrelate Training and Clustering for Online Speaker Diarization
    Chen, Yifan
    Cheng, Gaofeng
    Yang, Runyan
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1352 - 1364
  • [2] Deep Self-Supervised Hierarchical Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 294 - 298
  • [3] A Robust Stopping Criterion for Agglomerative Hierarchical Clustering in a Speaker Diarization System
    Han, Kyu J.
    Narayanan, Shrikanth S.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1005 - 1008
  • [4] SiamTDNN: Enhancing Discriminative Embeddings for Speaker Diarization
    Zhang, Runqing
    Lu, Huijun
    Cai, Dunbo
    Huang, Zhiguo
    Du, Yujian
    Qian, Ling
    Zhang, Yijun
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)
  • [5] Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization
    Chen, Yifan
    Guo, Yifan
    Li, Qingxuan
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2022, 2022, : 1456 - 1460
  • [6] Spectral Clustering Approach to Speaker Diarization
    Ning, Huazhong
    Liu, Ming
    Tang, Hao
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
  • [7] A Hybrid Generative-Discriminative Approach to Speaker Diarization
    Noulas, Athanasios K.
    van Kasteren, Tim
    Kroese, Ben J. A.
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 98 - 109
  • [8] Phone Adaptive Training for Speaker Diarization
    Bozonnet, Simon
    Vipperla, Ravichander
    Evans, Nicholas
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
  • [9] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
    Zibert, Janez
    Mihelic, France
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
  • [10] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564