DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION

被引:13
|
作者
Li, Qiujia [1 ]
Kreyssig, Florian L. [1 ]
Zhang, Chao [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Engn Dept, Trumpington St, Cambridge CB2 1PZ, England
关键词
speaker diarisation; supervised clustering; discriminative neural clustering; Transformer; DIARIZATION;
D O I
10.1109/SLT48900.2021.9383617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC. Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of L2-normalised speaker embeddings. Experimental results on AMI show that DNC achieves a reduction in speaker error rate (SER) of 29.4% relative to spectral clustering.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [41] Discriminative training of GMM for speaker identification
    delAlamo, CM
    Gil, FJC
    Munilla, CDL
    Gomez, LH
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 89 - 92
  • [42] Delay Based Optimisation of an Integrated Online Call Recording Speaker Diarisation and Identification System
    Melov, Aleksandar
    Gerazov, Branislav
    Ivanovski, Zoran
    [J]. 17TH IEEE INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES - IEEE EUROCON 2017 CONFERENCE PROCEEDINGS, 2017, : 307 - 311
  • [43] A Paralinguistic Approach To Speaker Diarisation Using Age, Gender, Voice Likability and Personality Traits
    Zhang, Yue
    Weninger, Felix
    Liu, Boqing
    Schmitt, Maximilian
    Eyben, Florian
    Schuller, Bjorn
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 387 - 392
  • [44] DISCRIMINATIVE EXEMPLAR CLUSTERING
    Yang, Yingzhen
    Liang, Feng
    Huang, Thomas S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] Discriminative Subspace Clustering
    Zografos, Vasileios
    Ellis, Liam
    Mester, Rudolf
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2107 - 2114
  • [46] Regularized discriminative clustering
    Kaski, S
    Sinkkonen, J
    Klami, A
    [J]. 2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 289 - 298
  • [47] Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
    Wang, Shuai
    Huang, Zili
    Qian, Yanmin
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1686 - 1696
  • [48] Learning Essential Speaker Sub-space Using Hetero-Associative Neural Networks for Speaker Clustering
    Ikbal, Shajith
    Visweswariah, Karthik
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 28 - 31
  • [49] DEEP NEURAL NETWORK BASED DISCRIMINATIVE TRAINING FOR I-VECTOR/PLDA SPEAKER VERIFICATION
    Zheng Tieran
    Han Jiqing
    Zheng Guibin
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5354 - 5358
  • [50] Speaker recognition via nonlinear phonetic and speaker-discriminative features
    Stoll, Lara
    Frankel, Joe
    Mirghafori, Nikki
    [J]. ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 114 - 123