Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引:3
|
作者
Peng, Junyi [1 ]
Gu, Rongzhi [1 ]
Zou, Yuexian [1 ,2 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
关键词
speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;
D O I
10.21437/Interspeech.2020-2470
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.
引用
下载
收藏
页码:3246 / 3250
页数:5
相关论文
共 50 条
  • [41] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [42] Text-independent speaker verification using speaker clustering and support vector machines
    Hou, FL
    Wang, BX
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 456 - 459
  • [43] A Survey on Text-Dependent and Text-Independent Speaker Verification
    Tu, Youzhi
    Lin, Weiwei
    Mak, Man-Wai
    IEEE ACCESS, 2022, 10 : 99038 - 99049
  • [44] A text-independent speaker verification model: A comparative analysis
    Charan, Rishi
    Manisha, A.
    Karthik, R.
    Kumar, Rajesh M.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [45] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Lee, Kong Aik
    Ma, Bin
    Guo, Wu
    Li, Haizhou
    Dai, Li Rong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
  • [46] Residual Factor Analysis for Text-independent Speaker Verification
    Zhu, Lei
    Zheng, Rong
    Xu, Bo
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 964 - 968
  • [47] Text-Independent Speaker Verification Based on Triplet Loss
    He, Junjie
    He, Jing
    Zhu, Liangjin
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
  • [48] Pseudo speaker models for text-independent speaker verification using rank threshold
    Chiba University, Chiba, Japan
    NLP-KE - Proc. Int. Conf. Nat. Lang. Process. Knowl. Eng., (265-268):
  • [49] CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Zhou, Tianyan
    Zhao, Yong
    Li, Jinyu
    Gong, Yifan
    Wu, Jian
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 718 - 725
  • [50] Text-Independent Speaker Verification with Dual Attention Network
    Li, Jingyu
    Lee, Tan
    INTERSPEECH 2020, 2020, : 956 - 960