DISCRIMINATIVE NEURAL CLUSTERING FOR SPEAKER DIARISATION

被引:13
|
作者
Li, Qiujia [1 ]
Kreyssig, Florian L. [1 ]
Zhang, Chao [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Engn Dept, Trumpington St, Cambridge CB2 1PZ, England
关键词
speaker diarisation; supervised clustering; discriminative neural clustering; Transformer; DIARIZATION;
D O I
10.1109/SLT48900.2021.9383617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC. Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of L2-normalised speaker embeddings. Experimental results on AMI show that DNC achieves a reduction in speaker error rate (SER) of 29.4% relative to spectral clustering.
引用
收藏
页码:574 / 581
页数:8
相关论文
共 50 条
  • [31] Discriminative training for speaker identification
    Hong, QY
    Kwong, S
    [J]. ELECTRONICS LETTERS, 2004, 40 (04) : 280 - 281
  • [32] Speaker indexing using neural network clustering of vowel spectra
    Roy D.K.
    [J]. International Journal of Speech Technology, 1997, 1 (2) : 143 - 149
  • [33] Discriminative clustering
    Kaski, S
    Sinkkonen, J
    Klami, A
    [J]. NEUROCOMPUTING, 2005, 69 (1-3) : 18 - 41
  • [34] Speaker Diarisation of Vibroacoustic Intelligence from Drone Mounted Laser Doppler Vibrometers
    Richmond, J. L.
    Halkon, B. J.
    [J]. 14TH INTERNATIONAL AIVELA CONFERENCE ON VIBRATION MEASUREMENTS BY LASER AND NONCONTACT TECHNIQUES (AIVELA 2021), 2021, 2041
  • [35] Who said that?: Audio-visual speaker diarisation of real-world meetings
    Chung, Joon Son
    Lee, Bong-Jin
    Han, Icksang
    [J]. INTERSPEECH 2019, 2019, : 371 - 375
  • [36] SPEAKER DIARISATION USING 2D SELF-ATTENTIVE COMBINATION OF EMBEDDINGS
    Sun, G.
    Zhang, C.
    Woodland, P. C.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5801 - 5805
  • [37] Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
    Xiang, Xu
    Wang, Shuai
    Huang, Houjun
    Qian, Yanmin
    Yu, Kai
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1652 - 1656
  • [38] SNORER DIARISATION BASED ON DEEP NEURAL NETWORK EMBEDDINGS
    Romero, Hector E.
    Ma, Ning
    Brown, Guy J.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 876 - 880
  • [39] Enhancing Speaker Diarization with Deep Neural Network Embeddings and Spectral Clustering
    Yanshan University, China
    [J].
  • [40] Robust End-to-end Speaker Diarization with Generic Neural Clustering
    Yang, Chenyu
    Wang, Yu
    [J]. INTERSPEECH 2022, 2022, : 1471 - 1475