New Advances in Speaker Diarization

被引：2

作者：

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[2
]}

Suzuki, Masayuki ^{[3
]}

Kurata, Gakuto ^{[3
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Haifa, Israel

[2] IBM Res AI, Yorktown Hts, NY USA

[3] IBM Res AI, Tokyo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker diarization; agglomerative hierarchical clustering; spectral clustering; uncertainty modeling; short utterances; number of clusters estimation;

D O I：

10.21437/Interspeech.2020-1879

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC). First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly. Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers. Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector. We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.

引用

页码：279 / 283

页数：5

共 50 条

[41] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
Le Lan, Gael
Meignier, Sylvain
Charlet, Delphine
Deleglise, Paul
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
[42] SPEAKER DIARIZATION AND LINKING OF LARGE CORPORA
Ferras, Marc
Bourlard, Herve
[J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 280 - 285
[43] Self-supervised Speaker Diarization
Dissen, Yehoshua
Kreuk, Felix
Keshet, Joseph
[J]. INTERSPEECH 2022, 2022, : 4013 - 4017
[44] Speaker Diarization for Meeting Room Audio
Sun, Hanwu
Nwe, Tin Lay
Ma, Bin
Li, Haizhou
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 888 - 891
[45] Spectral Clustering Approach to Speaker Diarization
Ning, Huazhong
Liu, Ming
Tang, Hao
Huang, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
[46] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
Liu, Tao
Xiang, Xu
Chen, Zhengyang
Han, Bing
Yu, Kai
Qian, Yanmin
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
[47] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
Zajic, Zbynek
Zelinka, Jan
Mueller, Ludek
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
[48] Triplet Network with Attention for Speaker Diarization
Song, Huan
Willi, Megan
Thiagarajan, Jayaraman J.
Berisha, Visar
Spanias, Andreas
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3608 - 3612
[49] pnf Improvements in speaker diarization system
Fu, Rong
Benest, Ian D.
[J]. SIGMAP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2007, : 317 - +
[50] Extending the Task of Diarization to Speaker Attribution
Ghaemmaghami, Houman
Dean, David
Vogt, Robbie
Sridharan, Sridha
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1056 - 1059

← 1 2 3 4 5 →