Analysis of Phonetic Dependence of Segmentation Errors in Speaker Diarization

被引：0

作者：

McKnight, Simon W. ^{[1
]}

Hogg, Aidan O. T. ^{[1
]}

Naylor, Patrick A. ^{[1
]}

机构：

[1] Imperial Coll London, Dept Elect & Elect Engn, London, England

来源：

28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020) | 2021年

关键词：

Speaker diarization; forgiveness collar; phoneme boundary; diarization scoring;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ground truth speaker segment boundaries such that estimated speaker segment boundaries with such collars are considered completely correct. This paper shows that the popular recent approach of removing forgiveness collars from speaker diarization evaluation tools can unfairly penalize speaker diarization systems that correctly estimate speaker segment boundaries. The uncertainty in identifying the start and/or end of a particular phoneme means that the ground truth segmentation is not perfectly accurate, and even trained human listeners are unable to identify phoneme boundaries with full consistency. This research analyses the phoneme dependence of this uncertainty, and shows that it depends on (i) whether the phoneme being detected is at the start or end of an utterance and (ii) what the phoneme is, so that the use of a uniform forgiveness collar is inadequate. This analysis is expected to point the way towards more indicative and repeatable assessment of the performance of speaker diarization systems.

引用

下载

页码：381 / 385

页数：5

共 50 条

[1] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
[2] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
Zibert, Janez
Mihelic, France
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
[3] Phonetic Subspace Mixture Model for Speaker Diarization
Chen, I-Fan
Cheng, Shih-Sian
Wang, Hsin-Min
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2298 - +
[4] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (1405-1408):
[5] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[6] Experiments with Segmentation in an Online Speaker Diarization System
Kunesova, Marie
Zajic, Zbynek
Radova, Vlasta
TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 429 - 437
[7] Neural speech turn segmentation and affinity propagation for speaker diarization
Yin, Ruiqing
Bredin, Herve
Barras, Claude
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
[8] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
VijayKumar, K.
Rao, R. Rajeswara
DATA & KNOWLEDGE ENGINEERING, 2023, 144
[9] Incorporation of the ASR Output in Speaker Segmentation and Clustering within the Task of Speaker Diarization of Broadcast Streams
Silovsky, Jan
Zdansky, Jindrich
Nouza, Jan
Cerva, Petr
Prazak, Jan
2012 IEEE 14TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2012, : 118 - 123
[10] BAYESIAN ANALYSIS OF SIMILARITY MATRICES FOR SPEAKER DIARIZATION
Sholokhov, Alexey
Pekhovsky, Timur
Kudashev, Oleg
Shulipa, Andrei
Kinnunen, Tomi
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →