An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions

被引:5
|
作者
Liu, Ying [1 ]
Song, Yan [1 ]
Jiang, Yiheng [1 ]
McLoughlin, Ian [1 ,2 ]
Liu, Lin [3 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
[2] Singapore Inst Technol, ICT Cluster, Singapore, Singapore
[3] iFLYTEK CO LTD, iFLYTEK Res, Hefei 230088, Anhui, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
speaker verification; mutual information learning; attentive bilinear pooling; multi-task framework;
D O I
10.21437/Interspeech.2020-1922
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep embedding learning based speaker verification methods have attracted significant recent research interest due to their superior performance. Existing methods mainly focus on designing frame-level feature extraction structures, utterance-level aggregation methods and loss functions to learn discriminative speaker embeddings. The scores of verification trials are then computed using cosine distance or Probabilistic Linear Discriminative Analysis (PLDA) classifiers. This paper proposes an effective speaker recognition method which is based on joint identification and verification supervisions, inspired by multi-task learning frameworks. Specifically, a deep architecture with convolutional feature extractor, attentive pooling and two classifier branches is presented. The first, an identification branch, is trained with additive margin softmax loss (AM-Softmax) to classify the speaker identities. The second, a verification branch, trains a discriminator with binary cross entropy loss (BCE) to optimize a new triplet-based mutual information. To balance the two losses during different training stages, a ramp-up/ramp-down weighting scheme is employed. Furthermore, an attentive bilinear pooling method is proposed to improve the effectiveness of embeddings. Extensive experiments have been conducted on VoxCeleb1 to evaluate the proposed method, demonstrating results that relatively reduce the equal error rate (EER) by 22% compared to the baseline system using identification supervision only.
引用
收藏
页码:3007 / 3011
页数:5
相关论文
共 50 条
  • [21] A novel text-independent speaker verification method based on the global speaker model
    Zhang, YY
    Zhang, D
    Zhu, XY
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (05): : 598 - 602
  • [22] Principal Factor Analysis and SVM Based Effective Speaker Recognition
    Rao, Rama Koteswara P.
    Rao, Srinivasa Y.
    Kumar, Vijaya D.
    2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
  • [23] SPEAKER IDENTIFICATION AND VERIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS
    REYNOLDS, DA
    SPEECH COMMUNICATION, 1995, 17 (1-2) : 91 - 108
  • [24] Speaker Identification Wavelet Transform Based Method
    Daqrouq, Khaled
    Al-Sawalmeh, Wael
    Al-Qawasmi, Abdel-Rahman
    Abu-Isbeib, Ibrahim N.
    2008 5TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES, VOLS 1 AND 2, 2008, : 698 - 702
  • [25] SPEAKER IDENTIFICATION BASED ON A MATRIX QUANTIZATION METHOD
    CHEN, MS
    LIN, PH
    WANG, HC
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (01) : 398 - 403
  • [26] Speaker discriminative weighting method for VQ-based speaker identification
    Kinnunen, T
    Fränti, P
    AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 150 - 156
  • [27] Speaker verification method based on deep information divergence maximization
    Chen, Chen
    Rong, Yafeng
    Ji, Chaoqun
    Chen, Deyun
    He, Yongjun
    Tongxin Xuebao/Journal on Communications, 2021, 42 (07): : 231 - 237
  • [28] A TRANSFER LEARNING METHOD FOR PLDA-BASED SPEAKER VERIFICATION
    Hong, Qingyang
    Zhang, Jun
    Li, Lin
    Wan, Lihong
    Tong, Feng
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5455 - 5459
  • [29] Speaker identification and verification based on cepstral features and fuzzy nonlinear classifier
    Dustor, A.
    Proceedings of the International Conference Mixed Design of Integrated Circuits and Systems, 2006, : 692 - 697
  • [30] One Speaker Recognition Method Based on Feature Fusion
    Wang, Jinming
    Xu, Yulong
    Xu, Zhijun
    Ni, Xue
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1264 - 1267