DNN-based speaker clustering for speaker diarisation

被引:13
|
作者
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
speaker diarisation; speaker separation; deep neural network; DIARIZATION;
D O I
10.21437/Interspeech.2016-126
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and non speech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.
引用
收藏
页码:2185 / 2189
页数:5
相关论文
共 50 条
  • [31] SCALING AND BIAS CODES FOR MODELING SPEAKER-ADAPTIVE DNN-BASED SPEECH SYNTHESIS SYSTEMS
    Hieu-Thi Luong
    Yamagishi, Junichi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 610 - 617
  • [32] Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
    Lorincz, Beata
    Stan, Adriana
    Giurgiu, Mircea
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 26 - 30
  • [33] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
  • [34] Redefining the Bayesian Information Criterion for Speaker Diarisation
    Stafylakis, Themos
    Katsouros, Vassilis
    Carayannis, George
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1055 - 1058
  • [35] DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
    Ozturk, Mirac Goksu
    Ulusoy, Okan
    Demiroglu, Cenk
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7030 - 7034
  • [36] Kernel-based speaker clustering for rapid speaker adaptation
    Hazrati, Dooz
    Ahadi, S. M.
    Sadjadi, Omid
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 1287 - 1289
  • [37] Segment-oriented evaluation of speaker diarisation performance
    Milner, Rosanna
    Hain, Thomas
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5460 - 5464
  • [38] Speaker Adaptation Using Speaker Similarity Score on DNN Features
    Rizwan, Muhammad
    Anderson, David V.
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 877 - 882
  • [39] USING AUDIO AND VISUAL CUES FOR SPEAKER DIARISATION INITIALISATION
    Garau, Giulia
    Bourlard, Herve
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4942 - 4945
  • [40] HIDDEN MARKOV MODEL DIARISATION WITH SPEAKER LOCATION INFORMATION
    Wong, Jeremy H. M.
    Xiao, Xiong
    Gong, Yifan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7158 - 7162