New Advances in Speaker Diarization

被引:2
|
作者
Aronowitz, Hagai [1 ]
Zhu, Weizhong [2 ]
Suzuki, Masayuki [3 ]
Kurata, Gakuto [3 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Haifa, Israel
[2] IBM Res AI, Yorktown Hts, NY USA
[3] IBM Res AI, Tokyo, Japan
来源
关键词
speaker diarization; agglomerative hierarchical clustering; spectral clustering; uncertainty modeling; short utterances; number of clusters estimation;
D O I
10.21437/Interspeech.2020-1879
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC). First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly. Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers. Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector. We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.
引用
收藏
页码:279 / 283
页数:5
相关论文
共 50 条
  • [1] Speaker count: a new building block for speaker diarization
    Duong, Thanh Thi-Hien
    Nguyen, Phi-Le
    Nguyen, Hong-Son
    Nguyen, Duc-Chien
    Phan, Huy
    Duong, Ngoc Q. K.
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1149 - 1155
  • [2] A review of speaker diarization: Recent advances with deep learning
    Park, Tae Jin
    Kanda, Naoyuki
    Dimitriadis, Dimitrios
    Han, Kyu J.
    Watanabe, Shinji
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [3] A Study of New Approaches to Speaker Diarization
    Reynolds, Douglas
    Kenny, Patrick
    Castaldo, Fabio
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1063 - +
  • [4] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [5] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [6] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [7] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [8] A NEW PENALTY TERM FOR THE BIC WITH RESPECT TO SPEAKER DIARIZATION
    Stafylakis, Themos
    Tzimiropoulos, Georgios
    Katsouros, Vassilis
    Carayannis, George
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4978 - 4981
  • [9] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
  • [10] WHERE ARE THE CHALLENGES IN SPEAKER DIARIZATION?
    Sinclair, Mark
    King, Simon
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7741 - 7745