Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition

被引:32
|
作者
Novoselov, Sergey [1 ,2 ]
Shchemelinin, Vadim [1 ,2 ]
Shulipa, Andrey [2 ]
Kozlov, Alexandr [1 ]
Kremnev, Ivan [1 ]
机构
[1] STC Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
关键词
speaker recognition; cosine similarity metric leaming; speaker embeddings;
D O I
10.21437/Interspeech.2018-1209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural network based speaker embeddings become increasingly popular in the text-independent speaker recognition task. In contrast to a generatively trained i-vector extractor, a DNN speaker embedding extractor is usually trained discriminatively in the closed set classification scenario using softmax. The problem we addressed in the paper is choosing a dnn based speaker embedding backend solution for the speaker verification scoring. There are several options to perform speaker verification in the dnn embedding space. One of them is using a simple heuristic speaker similarity metric for scoring (e.g. cosine metric). Similarly with i-vector based systems, the standard Linear Discriminant Analisys (LDA) followed by the Probabilistic Linear Discriminant Analisys (PLDA) can be used for segregating speaker information. As an alternative, the discriminative metric learning approach can be considered. This work demonstrates that performance of deep speaker embed dings based systems can be improved by using Cosine Similarity Metric Learning (CSML) with the triplet loss training scheme. Results obtained on Speakers in the Wild and NIST SRE 2016 evaluation sets demonstrate superiority and robustness of CSML based systems.
引用
收藏
页码:2242 / 2246
页数:5
相关论文
共 50 条
  • [1] Text-Independent Speaker Verification Based on Triplet Loss
    He, Junjie
    He, Jing
    Zhu, Liangjin
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
  • [2] Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification
    Ren, Zongze
    Chen, Zhiyong
    Xu, Shugong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 558 - 562
  • [3] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [4] LOGISTIC SIMILARITY METRIC LEARNING VIA AFFINITY MATRIX FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 704 - 709
  • [5] TEXT-INDEPENDENT SPEAKER RECOGNITION
    ATAL, BS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1972, 52 (01): : 181 - &
  • [6] Text-independent speaker recognition based on adaptive course learning loss and deep residual network
    Qinghua Zhong
    Ruining Dai
    Han Zhang
    Yongsheng Zhu
    Guofu Zhou
    [J]. EURASIP Journal on Advances in Signal Processing, 2021
  • [7] Exploring discriminative learning for text-independent speaker recognition
    Liu, Ming
    Zhang, Zhengyou
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 56 - 59
  • [8] Text-independent speaker recognition based on adaptive course learning loss and deep residual network
    Zhong, Qinghua
    Dai, Ruining
    Zhang, Han
    Zhu, Yongsheng
    Zhou, Guofu
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2021, 2021 (01)
  • [9] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    [J]. 2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [10] Angular Margin Centroid Loss for Text-independent Speaker Recognition
    Wei, Yuheng
    Du, Junzhao
    Liu, Hui
    [J]. INTERSPEECH 2020, 2020, : 3820 - 3824