DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification

被引：0

作者：

Guo, Xin ^{[1
]}

Luo, Chengfang ^{[2
]}

Deng, Aiwen ^{[2
]}

Deng, Feiqi ^{[2
]}

机构：

[1] Guangdong Commun Polytech, Guangzhou 510650, Peoples R China

[2] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China

来源：

AIMS MATHEMATICS | 2022年 / 7卷 / 04期

基金：

中国国家自然科学基金;

关键词：

speaker verification; text-independent; difference; DeltaVLAD; margin-based softmax loss function; few-shot learning-based loss function; MARGIN SOFTMAX;

D O I：

10.3934/math.2022355

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Text-independent speaker verification aims to determine whether two given utterances in open-set task originate from the same speaker or not. In this paper, some ways are explored to enhance the discrimination of embeddings in speaker verification. Firstly, difference is used in the coding layer to process speaker features to form the DeltaVLAD layer. The frame-level speaker representation is extracted by the deep neural network with differential operations to calculate the dynamic changes between frames, which is more conducive to capturing insignificant changes in the voiceprint. Meanwhile, NeXtVLAD is adopted to split the frame-level features into multiple word spaces before aggregating, and subsequently perform VLAD operations in each subspace, which can significantly reduce the number of parameters and improve performance. Secondly, the margin-based softmax loss function and the few-shot learning-based loss function are proposed to be combined for more discriminative speaker embeddings. Finally, for a fair comparison, the experimental results are performed on Voxceleb-1 showing superior performance of speaker verification system and can obtain new state-of-the-art results.

引用

页码：6381 / 6395

页数：15

共 50 条

[1] Neural Embedding Extractors for Text-Independent Speaker Verification
Alam, Jahangir
Kang, Woohyun
Fathan, Abderrahim
[J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
[2] A tutorial on text-independent speaker verification
[J]. Bimbot, F. (bimbot@irisa.fr), 1600, Hindawi Publishing Corporation (2004):
[3] A tutorial on text-independent speaker verification
Bimbot, F
Bonastre, JF
Fredouille, C
Gravier, G
Magrin-Chagnolleau, I
Meignier, S
Merlin, T
Ortega-García, J
Petrovska-Delacrétaz, D
Reynolds, DA
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
[4] A Tutorial on Text-Independent Speaker Verification
Frédéric Bimbot
Jean-François Bonastre
Corinne Fredouille
Guillaume Gravier
Ivan Magrin-Chagnolleau
Sylvain Meignier
Teva Merlin
Javier Ortega-García
Dijana Petrovska-Delacrétaz
Douglas A. Reynolds
[J]. EURASIP Journal on Advances in Signal Processing, 2004
[5] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Tang, Yun
Ding, Guohong
Huang, Jing
He, Xiaodong
Zhou, Bowen
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
[6] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Gupta, Vishwa
Kenny, Patrick
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
[7] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
Peng, Junyi
Gu, Rongzhi
Zou, Yuexian
[J]. INTERSPEECH 2020, 2020, : 3246 - 3250
[8] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
[9] A Study on Angular Based Embedding Learning for Text-independent Speaker Verification
Chen, Zhiyong
Ren, Zongze
Xu, Shugong
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 445 - 449
[10] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
Li, Rongjin
Li, Lin
Hong, Qingyang
Guo, Huiyang
Zhao, Miao
[J]. PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25

← 1 2 3 4 5 →