Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

被引:0
|
作者
Ren, Zongze [1 ]
Chen, Zhiyong [1 ]
Xu, Shugong [1 ]
机构
[1] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China
关键词
speaker verification; deep neural network; similarity learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Speaker embeddings become growing popular in the text-independent speaker verification task. In this paper, we propose two improvements during the training stage. The improvements are both based on triplet because the training stage and the evaluation stage of the baseline x-vector system focus on different aims. Firstly, we introduce triplet loss for optimizing the Euclidean distances between embeddings while minimizing the multi-class cross entropy loss. Secondly, we design an embedding similarity measurement network for controlling the similarity between the two selected embeddings. We further jointly train the two new methods with the original network and achieve state-of-the-art. The multi-task training synergies are shown with a 9% reduction equal error rate (EER) and detected cost function (DCF) on the 2016 NIST Speaker Recognition Evaluation (SRE) Test Set.
引用
收藏
页码:558 / 562
页数:5
相关论文
共 50 条
  • [1] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
    Chen, Zhiyong
    Ren, Zongze
    Xu, Shugong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
  • [2] Text-Independent Speaker Verification Based on Triplet Loss
    He, Junjie
    He, Jing
    Zhu, Liangjin
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
  • [3] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    [J]. 2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [4] Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition
    Novoselov, Sergey
    Shchemelinin, Vadim
    Shulipa, Andrey
    Kozlov, Alexandr
    Kremnev, Ivan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2242 - 2246
  • [5] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [6] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [7] A Novel Scoring Method Based on Distance Calculation for Similarity Measurement in Text-Independent Speaker Verification
    Hourri, Soufiane
    Kharroubi, Jamal
    [J]. SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS2018), 2019, 148 : 256 - 265
  • [8] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Tang, Yun
    Ding, Guohong
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
  • [9] Text-Independent Speaker Verification Based on Information Theoretic Learning
    Memon, Sheeraz
    Khanzada, Tariq Jameel Saifullah
    Bhatti, Sania
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2011, 30 (03) : 457 - 468
  • [10] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    [J]. INTERSPEECH 2020, 2020, : 3246 - 3250