New Advances in Speaker Diarization

被引:2
|
作者
Aronowitz, Hagai [1 ]
Zhu, Weizhong [2 ]
Suzuki, Masayuki [3 ]
Kurata, Gakuto [3 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Haifa, Israel
[2] IBM Res AI, Yorktown Hts, NY USA
[3] IBM Res AI, Tokyo, Japan
来源
关键词
speaker diarization; agglomerative hierarchical clustering; spectral clustering; uncertainty modeling; short utterances; number of clusters estimation;
D O I
10.21437/Interspeech.2020-1879
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC). First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly. Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers. Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector. We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.
引用
收藏
页码:279 / 283
页数:5
相关论文
共 50 条
  • [41] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [42] SPEAKER DIARIZATION AND LINKING OF LARGE CORPORA
    Ferras, Marc
    Bourlard, Herve
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 280 - 285
  • [43] Self-supervised Speaker Diarization
    Dissen, Yehoshua
    Kreuk, Felix
    Keshet, Joseph
    [J]. INTERSPEECH 2022, 2022, : 4013 - 4017
  • [44] Speaker Diarization for Meeting Room Audio
    Sun, Hanwu
    Nwe, Tin Lay
    Ma, Bin
    Li, Haizhou
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 888 - 891
  • [45] Spectral Clustering Approach to Speaker Diarization
    Ning, Huazhong
    Liu, Ming
    Tang, Hao
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
  • [46] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
    Liu, Tao
    Xiang, Xu
    Chen, Zhengyang
    Han, Bing
    Yu, Kai
    Qian, Yanmin
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
  • [47] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [48] Triplet Network with Attention for Speaker Diarization
    Song, Huan
    Willi, Megan
    Thiagarajan, Jayaraman J.
    Berisha, Visar
    Spanias, Andreas
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3608 - 3612
  • [49] pnf Improvements in speaker diarization system
    Fu, Rong
    Benest, Ian D.
    [J]. SIGMAP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2007, : 317 - +
  • [50] Extending the Task of Diarization to Speaker Attribution
    Ghaemmaghami, Houman
    Dean, David
    Vogt, Robbie
    Sridharan, Sridha
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1056 - 1059