DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION

被引:13
|
作者
Li, Qiujia [1 ]
Kreyssig, Florian L. [1 ]
Zhang, Chao [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Engn Dept, Trumpington St, Cambridge CB2 1PZ, England
关键词
speaker diarisation; supervised clustering; discriminative neural clustering; Transformer; DIARIZATION;
D O I
10.1109/SLT48900.2021.9383617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC. Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of L2-normalised speaker embeddings. Experimental results on AMI show that DNC achieves a reduction in speaker error rate (SER) of 29.4% relative to spectral clustering.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [1] DNN-based speaker clustering for speaker diarisation
    Milner, Rosanna
    Hain, Thomas
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2185 - 2189
  • [2] Adapting Speaker Embeddings for Speaker Diarisation
    Kwon, Youngki
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    [J]. INTERSPEECH 2021, 2021, : 3101 - 3105
  • [3] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
    Sun, G.
    Liu, D.
    Zhang, C.
    Woodland, P. C.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
  • [4] Combination of deep speaker embeddings for diarisation
    Sun, Guangzhi
    Zhang, Chao
    Woodland, Philip C.
    [J]. NEURAL NETWORKS, 2021, 141 : 372 - 384
  • [5] Discriminative Training for Hierarchical Clustering in Speaker Diarization
    Vinyals, Oriol
    Friedland, Gerald
    Morgan, Nelson
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2326 - +
  • [6] DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
    Milner, Rosanna
    Hain, Thomas
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4925 - 4929
  • [7] Speaker overlap detection with prosodic features for speaker diarisation
    Zelenak, M.
    Hernando, J.
    [J]. IET SIGNAL PROCESSING, 2012, 6 (08) : 798 - 804
  • [8] Strategies to Improve a Speaker Diarisation Tool
    Tavarez, David
    Navas, Eva
    Erro, Daniel
    Saratxaga, Ibon
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4117 - 4121
  • [9] Spot the conversation: speaker diarisation in the wild
    Chung, Joon Son
    Huh, Jaesung
    Nagrani, Arsha
    Afouras, Triantafyllos
    Zisserman, Andrew
    [J]. INTERSPEECH 2020, 2020, : 299 - 303
  • [10] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +