DNN-based speaker clustering for speaker diarisation

被引：13

作者：

Milner, Rosanna ^{[1
]}

Hain, Thomas ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

基金：

英国工程与自然科学研究理事会;

关键词：

speaker diarisation; speaker separation; deep neural network; DIARIZATION;

D O I：

10.21437/Interspeech.2016-126

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and non speech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.

引用

页码：2185 / 2189

页数：5

共 50 条

[41] ON THE EFFECT OF SNR AND SUPERDIRECTIVE BEAMFORMING IN SPEAKER DIARISATION IN MEETINGS
Zwyssig, Erich
Renals, Steve
Lincoln, Mike
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4177 - 4180
[42] UBM based speaker segmentation and clustering for 2-speaker detection
Deng, Jing
Zheng, Thomas Fang
Wu, Wenhu
[J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 116 - +
[43] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Koshinaka, Takafumi
Motlicek, Petr
Madikeri, Srikanth
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
[44] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
Lee, Junmo
Song, Kwangsub
Noh, Kyoungjin
Park, Tae-Jun
Chang, Joon-Hyuk
[J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64
[45] Hierarchical speaker identification using speaker clustering
Sun, B
Liu, WJ
Zhong, QH
[J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 299 - 304
[46] Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
Ochiai, Tsubasa
Matsuda, Shigeki
Watanabe, Hideyuki
Lu, Xugang
Hori, Chiori
Kawai, Hisashi
Katagiri, Shigeru
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2431 - 2443
[47] ANALYSIS OF DNN APPROACHES TO SPEAKER IDENTIFICATION
Matejka, Pavel
Glembek, Ondrej
Novotny, Ondrej
Plchot, Oldrich
Grezl, Frantisek
Burget, Lukas
Cernocky, Jan ''Honza''
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5100 - 5104
[48] IMPROVED LARGE-MARGIN SOFTMAX LOSS FOR SPEAKER DIARISATION
Fathullah, Y.
Zhang, C.
Woodland, P. C.
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7104 - 7108
[49] Speaker indexing and adaptation using speaker clustering based on statistical model selection
Nishida, M
Kawahara, T
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 353 - 356
[50] EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING
Rownicka, Joanna
Bell, Peter
Renals, Steve
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 479 - 486

← 1 2 3 4 5 →