Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?

被引：0

作者：

Wang, Qiongqiong ^{[1
]}

Lee, Kong Aik ^{[1
]}

Liu, Tianchi ^{[1
,2
]}

机构：

[1] ASTAR, Inst Infocomm Res I2R, Singapore, Singapore

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

来源：

INTERSPEECH 2022 | 2022年

关键词：

speaker verification; large-margin softmax; cosine similarity; PLDA; ECAPA-TDNN;

D O I：

10.21437/Interspeech.2022-10055

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The emergence of large-margin softmax cross-entropy losses in training deep speaker embedding neural networks has triggered a gradual shift from parametric back-ends to a simpler cosine similarity measure for speaker verification. Popular parametric back-ends include the probabilistic linear discriminant analysis (PLDA) and its variants. This paper investigates the properties of margin-based cross-entropy losses leading to such a shift, and aims to find scoring back-ends best suited for speaker verification. In addition, we revisit the pre-processing techniques which have been widely used in the past and assess their effectiveness on large-margin embeddings. Experiments on the state-of-the-art ECAPA-TDNN networks trained with various large-margin softmax cross-entropy losses show a substantial increment in intra-speaker compactness making the conventional PLDA superfluous. In this regard, we found that constraining the within-speaker covariance matrix could improve the performance of the PLDA. It is demonstrated through a series of experiments on the VoxCeleb-1 and SITW core-core test sets with 40.8% equal error rate (EER) reduction and 35.1% minimum detection cost (minDCF) reduction. It also outperforms cosine scoring consistently with reductions in EER and minDCF by 10.9% and 4.9%, respectively.

引用

页码：600 / 604

页数：5

共 50 条

[31] VARIABILITY REGULARIZATION IN LARGE-MARGIN CLASSIFICATION
Mansjur, Dwi Sianto
Wada, Ted S.
Juang, Biing-Hwang
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1956 - 1959
[32] Large-Margin Convex Polytope Machine
Kantchelian, Alex
Tschantz, Michael Carl
Huang, Ling
Bartlett, Peter L.
Joseph, Anthony D.
Tygar, J. D.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[33] Large-Margin Determinantal Point Processes
Chao, Wei-Lun
Gong, Boqing
Grauman, Kristen
Sha, Fei
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 191 - 200
[34] SNR-Invariant PLDA Modeling for Robust Speaker Verification
Li, Na
Mak, Man-Wai
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2317 - 2321
[35] Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification
Wang, Qiongqiong
Koshinaka, Takafumi
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3727 - 3731
[36] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Chen, Liping
Lee, Kong Aik
Ma, Bin
Guo, Wu
Li, Haizhou
Dai, Li Rong
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
[37] Non-speaker information reduction from Cosine Similarity Scoring in i-vector based speaker verification
Zeinali, Hossein
Mirian, Alireza
Sameti, Hossein
BabaAli, Bagher
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2015, 48 : 226 - 238
[38] Cosine Scoring With Uncertainty for Neural Speaker Embedding
Wang, Qiongqiong
Lee, Kong Aik
[J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 845 - 849
[39] Duration Dependent Covariance Regularization in PLDA Modeling for Speaker Verification
Cai, Weicheng
Li, Ming
Li, Lin
Hong, Qingyang
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1027 - 1031
[40] A TRANSFER LEARNING METHOD FOR PLDA-BASED SPEAKER VERIFICATION
Hong, Qingyang
Zhang, Jun
Li, Lin
Wan, Lihong
Tong, Feng
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5455 - 5459

← 1 2 3 4 5 →