DNN-based speaker clustering for speaker diarisation

被引:13
|
作者
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
speaker diarisation; speaker separation; deep neural network; DIARIZATION;
D O I
10.21437/Interspeech.2016-126
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and non speech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.
引用
收藏
页码:2185 / 2189
页数:5
相关论文
共 50 条
  • [41] ON THE EFFECT OF SNR AND SUPERDIRECTIVE BEAMFORMING IN SPEAKER DIARISATION IN MEETINGS
    Zwyssig, Erich
    Renals, Steve
    Lincoln, Mike
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4177 - 4180
  • [42] UBM based speaker segmentation and clustering for 2-speaker detection
    Deng, Jing
    Zheng, Thomas Fang
    Wu, Wenhu
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 116 - +
  • [43] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Koshinaka, Takafumi
    Motlicek, Petr
    Madikeri, Srikanth
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
  • [44] DNN based multi-speaker speech synthesis with temporal auxiliary speaker ID embedding
    Lee, Junmo
    Song, Kwangsub
    Noh, Kyoungjin
    Park, Tae-Jun
    Chang, Joon-Hyuk
    [J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 61 - 64
  • [45] Hierarchical speaker identification using speaker clustering
    Sun, B
    Liu, WJ
    Zhong, QH
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 299 - 304
  • [46] Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Hori, Chiori
    Kawai, Hisashi
    Katagiri, Shigeru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2431 - 2443
  • [47] ANALYSIS OF DNN APPROACHES TO SPEAKER IDENTIFICATION
    Matejka, Pavel
    Glembek, Ondrej
    Novotny, Ondrej
    Plchot, Oldrich
    Grezl, Frantisek
    Burget, Lukas
    Cernocky, Jan ''Honza''
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5100 - 5104
  • [48] IMPROVED LARGE-MARGIN SOFTMAX LOSS FOR SPEAKER DIARISATION
    Fathullah, Y.
    Zhang, C.
    Woodland, P. C.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7104 - 7108
  • [49] Speaker indexing and adaptation using speaker clustering based on statistical model selection
    Nishida, M
    Kawahara, T
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 353 - 356
  • [50] EMBEDDINGS FOR DNN SPEAKER ADAPTIVE TRAINING
    Rownicka, Joanna
    Bell, Peter
    Renals, Steve
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 479 - 486