PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Bai, Zhongxin [1 ,2 ]
Zhang, Xiao-Lei [1 ,2 ]
Chen, Jingdong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Ctr Intelligent Acoust & Immers Commun, Xian, Peoples R China
[2] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian, Peoples R China
基金
以色列科学基金会; 美国国家科学基金会;
关键词
speaker verification; pAUC optimization; speaker centers; verification loss; RECOGNITION;
D O I
10.1109/icassp40776.2020.9053674
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
引用
收藏
页码:6819 / 6823
页数:5
相关论文
共 50 条
  • [21] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
    Chen, Zhiyong
    Ren, Zongze
    Xu, Shugong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
  • [22] Text-Independent Speaker Verification Based on Triplet Loss
    He, Junjie
    He, Jing
    Zhu, Liangjin
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
  • [23] Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification
    Shum, Stephen
    Dehak, Najim
    Dehak, Reda
    Glass, James R.
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 76 - 82
  • [24] A novel text-independent speaker verification method based on the global speaker model
    Zhang, YY
    Zhang, D
    Zhu, XY
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (05): : 598 - 602
  • [25] DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
    Guo, Xin
    Luo, Chengfang
    Deng, Aiwen
    Deng, Feiqi
    AIMS MATHEMATICS, 2022, 7 (04): : 6381 - 6395
  • [26] TEXT-INDEPENDENT SPEAKER VERIFICATION WITH ADVERSARIAL LEARNING ON SHORT UTTERANCES
    Liu, Kai
    Zhou, Huan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6569 - 6573
  • [27] Graphical models for text-independent speaker verification
    Sánchez-Soto, E
    Sigelle, M
    Chollet, G
    NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 410 - 415
  • [28] Language dependency in text-independent speaker verification
    Auckenthaler, R
    Carey, MJ
    Mason, JSD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 441 - 444
  • [29] Text-independent speaker verification in embedded environments
    Tydlitat, Borivoj
    Navratil, Jiri
    Pelecanos, Jason W.
    Ramaswamy, Ganesh N.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 293 - +