DNN-based speaker clustering for speaker diarisation

被引:13
|
作者
Milner, Rosanna [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
基金
英国工程与自然科学研究理事会;
关键词
speaker diarisation; speaker separation; deep neural network; DIARIZATION;
D O I
10.21437/Interspeech.2016-126
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and non speech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.
引用
收藏
页码:2185 / 2189
页数:5
相关论文
共 50 条
  • [1] DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
    Milner, Rosanna
    Hain, Thomas
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4925 - 4929
  • [2] DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION
    Li, Qiujia
    Kreyssig, Florian L.
    Zhang, Chao
    Woodland, Philip C.
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 574 - 581
  • [3] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [4] SPEAKER AND LANGUAGE FACTORIZATION IN DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5540 - 5544
  • [5] UNSUPERVISED SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5135 - 5139
  • [6] A DNN-based emotional speech synthesis by speaker adaptation
    Yang, Hongwu
    Zhang, Weizhao
    Zhi, Pengpeng
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 633 - 637
  • [7] On the Issue of Calibration in DNN-based Speaker Recognition Systems
    McLaren, Mitchell
    Castan, Diego
    Ferrer, Luciana
    Lawson, Aaron
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1825 - 1829
  • [8] DNN-based Models for Speaker Age and Gender Classification
    Qawaqneh, Zakariya
    Abu Mallouh, Arafat
    Barkana, Buket D.
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 4: BIOSIGNALS, 2017, : 106 - 111
  • [9] DNN-Based Speech Synthesis Using Speaker Codes
    Hojo, Nobukatsu
    Ijima, Yusuke
    Mizuno, Hideyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 462 - 472
  • [10] A study of speaker adaptation for DNN-based speech synthesis
    Wu, Zhizheng
    Swietojanski, Pawel
    Veaux, Christophe
    Renals, Steve
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 879 - 883