New Advances in Speaker Diarization

被引：2

作者：

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[2
]}

Suzuki, Masayuki ^{[3
]}

Kurata, Gakuto ^{[3
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Haifa, Israel

[2] IBM Res AI, Yorktown Hts, NY USA

[3] IBM Res AI, Tokyo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker diarization; agglomerative hierarchical clustering; spectral clustering; uncertainty modeling; short utterances; number of clusters estimation;

D O I：

10.21437/Interspeech.2020-1879

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC). First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly. Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers. Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector. We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.

引用

页码：279 / 283

页数：5

共 50 条

[1] Speaker count: a new building block for speaker diarization
Duong, Thanh Thi-Hien
Nguyen, Phi-Le
Nguyen, Hong-Son
Nguyen, Duc-Chien
Phan, Huy
Duong, Ngoc Q. K.
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1149 - 1155
[2] A review of speaker diarization: Recent advances with deep learning
Park, Tae Jin
Kanda, Naoyuki
Dimitriadis, Dimitrios
Han, Kyu J.
Watanabe, Shinji
Narayanan, Shrikanth
[J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
[3] A Study of New Approaches to Speaker Diarization
Reynolds, Douglas
Kenny, Patrick
Castaldo, Fabio
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1063 - +
[4] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[5] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[6] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[7] Trainable Speaker Diarization
Aronowitz, Hagai
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
[8] A NEW PENALTY TERM FOR THE BIC WITH RESPECT TO SPEAKER DIARIZATION
Stafylakis, Themos
Tzimiropoulos, Georgios
Katsouros, Vassilis
Carayannis, George
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4978 - 4981
[9] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
Pang, Bowen
Zhao, Huan
Zhang, Gaosheng
Yang, Xiaoyue
Sun, Yang
Zhang, Li
Wang, Qing
Xie, Lei
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
[10] WHERE ARE THE CHALLENGES IN SPEAKER DIARIZATION?
Sinclair, Mark
King, Simon
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7741 - 7745

← 1 2 3 4 5 →