Robust Speaker Diarization for Short Speech Recordings

被引：11

作者：

Imseng, David ^{[1
,2
]}

Friedland, Gerald ^{[3
]}

机构：

[1] Idiap Res Inst, POB 592, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

D O I：

10.1109/ASRU.2009.5373254

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate a state-of-the-art Speaker Diarization system regarding its behavior on meetings that are much shorter (from 500 seconds down to 100 seconds) than those typically analyzed in Speaker Diarization benchmarks. First, the problems inherent to this task are analyzed. Then, we propose an approach that consists of a novel initialization parameter estimation method for typical state-of-the-art diarization approaches. The estimation method balances the relationship between the optimal value of the duration of speech data per Gaussian and the duration of the speech data, which is verified experimentally for the first time in this article. As a result, the Diarization Error Rate for short meetings extracted from the 2006, 2007, and 2009 NIST RT evaluation data is decreased by up to 50 % relative.

引用

页码：432 / +

页数：2

共 50 条

[21] The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
Liu, Tao
Xiang, Xu
Chen, Zhengyang
Han, Bing
Yu, Kai
Qian, Yanmin
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 498 - 501
[22] Speaker diarization:: Towards a more robust and portable system
El Khoury, Elie
Senac, Christine
Andre-Obrecht, Regine
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 489 - +
[23] Tuning-Robust Initialization Methods for Speaker Diarization
Imseng, David
Friedland, Gerald
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2028 - 2037
[24] Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings
Yella, Sree Harsha
Valente, Fabio
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 960 - 963
[25] Robust speaker clustering strategies to data source variation for improved speaker diarization
Han, Kyu J.
Kim, Samuel
Narayanan, Shrikanth S.
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 262 - 267
[26] Overlapped speech detection for improved speaker diarization in multiparty meetings
Boakye, Kofi
Trueba-Hornero, Beatriz
Vinyals, Oriol
Friedland, Gerald
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356
[27] Methodologies for the evaluation of Speaker Diarization and Automatic Speech Recognition in the presence of overlapping speech
Galibert, Olivier
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1130 - 1133
[28] Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis
Ravi D. Shankar
R. B. Manjula
Rajashekhar C. Biradar
SN Computer Science, 6 (1)
[29] Neural speech turn segmentation and affinity propagation for speaker diarization
Yin, Ruiqing
Bredin, Herve
Barras, Claude
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
[30] Joint Speech Recognition and Speaker Diarization via Sequence Transduction
El Shafey, Laurent
Soltau, Hagen
Shafran, Izhak
INTERSPEECH 2019, 2019, : 396 - 400

← 1 2 3 4 5 →