IMPROVED LARGE-MARGIN SOFTMAX LOSS FOR SPEAKER DIARISATION

被引:0
|
作者
Fathullah, Y. [1 ]
Zhang, C. [1 ]
Woodland, P. C. [1 ]
机构
[1] Univ Cambridge, Engn Dept, Cambridge, England
关键词
Speaker diarisation; speaker embeddings; large-margin softmax; overlapping speech; DIARIZATION;
D O I
10.1109/icassp40776.2020.9053373
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker error rate (SER). By using all hyper parameters of the loss in a unified way, further improvements were achieved which reached a relative SER reduction of 24.6% over the baseline. However, by training overlapping and single speaker speech samples with different margins, the best result was achieved, giving overall a 29.5% SER reduction relative to the baseline.
引用
收藏
页码:7104 / 7108
页数:5
相关论文
共 50 条
  • [41] Large-margin multi-view Gaussian process
    Chang Xu
    Dacheng Tao
    Yangxi Li
    Chao Xu
    [J]. Multimedia Systems, 2015, 21 : 147 - 157
  • [42] A flexible probabilistic framework for large-margin mixture of experts
    Sharma, Archit
    Saxena, Siddhartha
    Rai, Piyush
    [J]. MACHINE LEARNING, 2019, 108 (8-9) : 1369 - 1393
  • [43] Large-margin Distribution Machine-based regression
    Rastogi, Reshma
    Anand, Pritam
    Chandra, Suresh
    [J]. Neural Computing and Applications, 2020, 32 (08) : 3633 - 3648
  • [44] Large-Margin Feature Adaptation for Automatic Speech Recognition
    Cheng, Chih-Chieh
    Sha, Fei
    Saul, Lawrence K.
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 87 - +
  • [45] Large-Margin Learning of Compact Binary Image Encodings
    Paisitkriangkrai, Sakrapee
    Shen, Chunhua
    van den Hengel, Anton
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 4041 - 4054
  • [46] Multicategory angle-based large-margin classification
    Zhang, Chong
    Liu, Yufeng
    [J]. BIOMETRIKA, 2014, 101 (03) : 625 - 640
  • [47] Angular Softmax Loss for End-to-end Speaker Verification
    Li, Yutian
    Gao, Feng
    Ou, Zhijian
    Sun, Jiasong
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 190 - 194
  • [48] Scalable Large-Margin Mahalanobis Distance Metric Learning
    Shen, Chunhua
    Kim, Junae
    Wang, Lei
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (09): : 1524 - 1530
  • [49] Coherence Functions with Applications in Large-Margin Classification Methods
    Zhang, Zhihua
    Liu, Dehua
    Dai, Guang
    Jordan, Michael I.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 2705 - 2734
  • [50] Hard or Soft Classification? Large-Margin Unified Machines
    Liu, Yufeng
    Zhang, Hao Helen
    Wu, Yichao
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 166 - 177