Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引:3
|
作者
Peng, Junyi [1 ]
Gu, Rongzhi [1 ]
Zou, Yuexian [1 ,2 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
关键词
speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;
D O I
10.21437/Interspeech.2020-2470
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.
引用
下载
收藏
页码:3246 / 3250
页数:5
相关论文
共 50 条
  • [31] IMPROVING DEEP CNN NETWORKS WITH LONG TEMPORAL CONTEXT FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhao, Yong
    Zhou, Tianyan
    Chen, Zhuo
    Wu, Jian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6834 - 6838
  • [32] Speaker adaptive cohort selection for Tnorm in text-independent speaker verification
    Sturim, DE
    Reynolds, DA
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 741 - 744
  • [33] Angular Softmax for Short-Duration Text-independent Speaker Verification
    Huang, Zili
    Wang, Shuai
    Yu, Kai
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3623 - 3627
  • [34] PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Chen, Jingdong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6819 - 6823
  • [35] CONTRASTIVE SELF-SUPERVISED LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhang, Haoran
    Zou, Yuexian
    Wang, Helin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6713 - 6717
  • [36] Angular Margin Centroid Loss for Text-independent Speaker Recognition
    Wei, Yuheng
    Du, Junzhao
    Liu, Hui
    INTERSPEECH 2020, 2020, : 3820 - 3824
  • [37] End-to-End Feature Learning for Text-Independent Speaker Verification
    Chen, Fangzhou
    Bian, Tengyue
    Xu, Li
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3949 - 3954
  • [38] A novel text-independent speaker verification method based on the global speaker model
    Zhang, YY
    Zhang, D
    Zhu, XY
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (05): : 598 - 602
  • [39] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [40] Deep Embedding Learning for Text-Dependent Speaker Verification
    Zhang, Peng
    Hu, Peng
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 3461 - 3465