New Advances in Speaker Diarization

被引：2

作者：

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[2
]}

Suzuki, Masayuki ^{[3
]}

Kurata, Gakuto ^{[3
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Haifa, Israel

[2] IBM Res AI, Yorktown Hts, NY USA

[3] IBM Res AI, Tokyo, Japan

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker diarization; agglomerative hierarchical clustering; spectral clustering; uncertainty modeling; short utterances; number of clusters estimation;

D O I：

10.21437/Interspeech.2020-1879

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, speaker diarization based on speaker embeddings has shown excellent results in many works. In this paper we propose several enhancements throughout the diarization pipeline. This work addresses two clustering frameworks: agglomerative hierarchical clustering (AHC) and spectral clustering (SC). First, we use multiple speaker embeddings. We show that fusion of x-vectors and d-vectors boosts accuracy significantly. Second, we train neural networks to leverage both acoustic and duration information for scoring similarity of segments or clusters. Third, we introduce a novel method to guide the AHC clustering mechanism using a neural network. Fourth, we handle short duration segments in SC by deemphasizing their effect on setting the number of speakers. Finally, we propose a novel method for estimating the number of clusters in the SC framework. The method takes each eigenvalue and analyzes the projections of the SC similarity matrix on the corresponding eigenvector. We evaluated our system on NIST SRE 2000 CALLHOME and, using cross-validation, we achieved an error rate of 5.1%, going beyond state-of-the-art speaker diarization.

引用

页码：279 / 283

页数：5

共 50 条

[11] An Improved Speaker Diarization System
Fu, Rong
Benest, Ian D.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256
[12] SPEAKER DIARIZATION IN MEETING AUDIO
Nwe, Tin Lay
Sun, Hanwu
Li, Haizhou
Rahardja, Susanto
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4073 - 4076
[13] FULLY SUPERVISED SPEAKER DIARIZATION
Zhang, Aonan
Wang, Quan
Zhu, Zhenyao
Paisley, John
Wang, Chong
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6301 - 6305
[14] Speaker Diarization with Lexical Information
Park, Tae Jin
Han, Kyu J.
Huang, Jing
He, Xiaodong
Zhou, Bowen
Georgiou, Panayiotis
Narayanan, Shrikanth
INTERSPEECH 2019, 2019, : 391 - 395
[15] A new architecture based VAD for speaker diarization/detection systems
Kenai, Ouassila
Ouamour, Siham
Guerti, Mhania
Asbai, Nassim
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 827 - 840
[16] A new architecture based VAD for speaker diarization/detection systems
Ouassila Kenai
Siham Ouamour
Mhania Guerti
Nassim Asbai
International Journal of Speech Technology, 2019, 22 : 827 - 840
[17] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (1405-1408):
[18] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[19] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
[20] Exploring methods of improving speaker accuracy for speaker diarization
Knox, Mary Tai
Mirghafori, Nikki
Friedland, Gerald
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786

← 1 2 3 4 5 →