Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引:3
|
作者
Peng, Junyi [1 ]
Gu, Rongzhi [1 ]
Zou, Yuexian [1 ,2 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
关键词
speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;
D O I
10.21437/Interspeech.2020-2470
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.
引用
下载
收藏
页码:3246 / 3250
页数:5
相关论文
共 50 条
  • [1] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [2] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Tang, Yun
    Ding, Guohong
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
  • [3] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [4] Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
    Wang, Shuai
    Huang, Zili
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1686 - 1696
  • [5] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
    Li, Rongjin
    Li, Lin
    Hong, Qingyang
    Guo, Huiyang
    Zhao, Miao
    PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
  • [6] TEXT-INDEPENDENT SPEAKER VERIFICATION WITH ADVERSARIAL LEARNING ON SHORT UTTERANCES
    Liu, Kai
    Zhou, Huan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6569 - 6573
  • [7] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [8] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
    Chen, Zhiyong
    Ren, Zongze
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
  • [9] Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    Wang, Hsin-Min
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 827 - 832
  • [10] Deep multi-metric learning for text-independent speaker verification
    Xu, Jiwei
    Wang, Xinggang
    Feng, Bin
    Liu, Wenyu
    NEUROCOMPUTING, 2020, 410 : 394 - 400