Improving speaker diarization by cross EM refinement

被引:2
|
作者
Ning, Huazhong [1 ]
Xu, Wei [2 ]
Gong, Yihong [2 ]
Huang, Thomas [1 ]
机构
[1] Univ Illinois, Beckman Inst, Urbana, IL 61801 USA
[2] NEC Lab America Inc, Cupertina, CA 95070 USA
关键词
cross EM refinement; hierarchical clustering; BIC; speaker diarization;
D O I
10.1109/ICME.2006.262927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a new speaker diarization system that improves the accuracy of traditional hierarchical clustering-based methods with little increase in computational cost. Our contributions are mainly two fold. First, we include a preprocessing called "local clustering" before the hierarchical clustering algorithm to merge very similar adjacent speech segments. This local clustering aims to reduce the number of segments to be clustered by the hierarchical clustering, so as to dramatically increase the processing speed. Second, we perform a postprocessing called "cross EM refinement" to purify the clusters generated by the hierarchical clustering. This algorithm is based on the idea of cross validation and EM algorithm. Our experimental evaluations show that the proposed cross EM refinement approach reduces the speaker diarization error by up to 56%, with an average reduction of 22% compared to the traditional hierarchical clustering method.
引用
收藏
页码:1901 / 1904
页数:4
相关论文
共 50 条
  • [1] A Modified Approach to Cluster Refinement for Speaker Diarization
    Zhu, Liping
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 1457 - 1460
  • [2] Exploring methods of improving speaker accuracy for speaker diarization
    Knox, Mary Tai
    Mirghafori, Nikki
    Friedland, Gerald
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786
  • [3] Model complexity selection and cross-validation em training for robust speaker diarization
    Anguera, Xavier
    Shinozaki, Takahiro
    Wooters, Chuck
    Hernando, Javier
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 273 - +
  • [4] Improving Speaker Diarization for CHIL Lecture Meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
  • [5] IMPROVING SEPARATION-BASED SPEAKER DIARIZATION VIA ITERATIVE MODEL REFINEMENT AND SPEAKER EMBEDDING BASED POST-PROCESSING
    Niu, Shu-Tong
    Du, Jun
    Sun, Lei
    Lee, Chin-Hui
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8387 - 8391
  • [6] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [7] Investigation of Cross-show Speaker Diarization
    Yang, Qian
    Jin, Qin
    Schultz, Tanja
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2936 - +
  • [8] IMPROVING SPEAKER DIARIZATION USING SOCIAL ROLE INFORMATION
    Sapru, Ashtosh
    Yella, Sree Harsha
    Bourlard, Herve
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] SPEAKER DIARIZATION WITH SESSION-LEVEL SPEAKER EMBEDDING REFINEMENT USING GRAPH NEURAL NETWORKS
    Wang, Jixuan
    Xiao, Xiong
    Wu, Jian
    Ramamurthy, Ranjani
    Rudzicz, Frank
    Brudno, Michael
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7109 - 7113
  • [10] Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement
    Zajic, Zbynek
    Hruz, Marek
    Mueller, Ladek
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3562 - 3566