An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions

被引：5

作者：

Liu, Ying ^{[1
]}

Song, Yan ^{[1
]}

Jiang, Yiheng ^{[1
]}

McLoughlin, Ian ^{[1
,2
]}

Liu, Lin ^{[3
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

[2] Singapore Inst Technol, ICT Cluster, Singapore, Singapore

[3] iFLYTEK CO LTD, iFLYTEK Res, Hefei 230088, Anhui, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

基金：

中国国家自然科学基金;

关键词：

speaker verification; mutual information learning; attentive bilinear pooling; multi-task framework;

D O I：

10.21437/Interspeech.2020-1922

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Deep embedding learning based speaker verification methods have attracted significant recent research interest due to their superior performance. Existing methods mainly focus on designing frame-level feature extraction structures, utterance-level aggregation methods and loss functions to learn discriminative speaker embeddings. The scores of verification trials are then computed using cosine distance or Probabilistic Linear Discriminative Analysis (PLDA) classifiers. This paper proposes an effective speaker recognition method which is based on joint identification and verification supervisions, inspired by multi-task learning frameworks. Specifically, a deep architecture with convolutional feature extractor, attentive pooling and two classifier branches is presented. The first, an identification branch, is trained with additive margin softmax loss (AM-Softmax) to classify the speaker identities. The second, a verification branch, trains a discriminator with binary cross entropy loss (BCE) to optimize a new triplet-based mutual information. To balance the two losses during different training stages, a ramp-up/ramp-down weighting scheme is employed. Furthermore, an attentive bilinear pooling method is proposed to improve the effectiveness of embeddings. Extensive experiments have been conducted on VoxCeleb1 to evaluate the proposed method, demonstrating results that relatively reduce the equal error rate (EER) by 22% compared to the baseline system using identification supervision only.

引用

页码：3007 / 3011

页数：5

共 50 条

[21] A novel text-independent speaker verification method based on the global speaker model
Zhang, YY
Zhang, D
Zhu, XY
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (05): : 598 - 602
[22] Principal Factor Analysis and SVM Based Effective Speaker Recognition
Rao, Rama Koteswara P.
Rao, Srinivasa Y.
Kumar, Vijaya D.
2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,
[23] SPEAKER IDENTIFICATION AND VERIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS
REYNOLDS, DA
SPEECH COMMUNICATION, 1995, 17 (1-2) : 91 - 108
[24] Speaker Identification Wavelet Transform Based Method
Daqrouq, Khaled
Al-Sawalmeh, Wael
Al-Qawasmi, Abdel-Rahman
Abu-Isbeib, Ibrahim N.
2008 5TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS AND DEVICES, VOLS 1 AND 2, 2008, : 698 - 702
[25] SPEAKER IDENTIFICATION BASED ON A MATRIX QUANTIZATION METHOD
CHEN, MS
LIN, PH
WANG, HC
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (01) : 398 - 403
[26] Speaker discriminative weighting method for VQ-based speaker identification
Kinnunen, T
Fränti, P
AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 150 - 156
[27] Speaker verification method based on deep information divergence maximization
Chen, Chen
Rong, Yafeng
Ji, Chaoqun
Chen, Deyun
He, Yongjun
Tongxin Xuebao/Journal on Communications, 2021, 42 (07): : 231 - 237
[28] A TRANSFER LEARNING METHOD FOR PLDA-BASED SPEAKER VERIFICATION
Hong, Qingyang
Zhang, Jun
Li, Lin
Wan, Lihong
Tong, Feng
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5455 - 5459
[29] Speaker identification and verification based on cepstral features and fuzzy nonlinear classifier
Dustor, A.
Proceedings of the International Conference Mixed Design of Integrated Circuits and Systems, 2006, : 692 - 697
[30] One Speaker Recognition Method Based on Feature Fusion
Wang, Jinming
Xu, Yulong
Xu, Zhijun
Ni, Xue
2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1264 - 1267

← 1 2 3 4 5 →