DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Tang, Yun ^{[1
]}

Ding, Guohong ^{[1
]}

Huang, Jing ^{[1
]}

He, Xiaodong ^{[1
]}

Zhou, Bowen ^{[1
]}

机构：

[1] JD AI Res, 675 East Middlefield Rd, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Speaker recognition; x-vector; multi-level pooling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: ( 1) a hybrid neural network structure using both time delay neural network ( TDNN) and long short-term memory neural networks ( LSTM) to generate complementary speaker information at different levels; ( 2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; ( 3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test ( with a 19% EER reduction) and SRE 2018 dev test ( with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.

引用

页码：6116 / 6120

页数：5

共 50 条

[41] A Survey on Text-Dependent and Text-Independent Speaker Verification
Tu, Youzhi
Lin, Weiwei
Mak, Man-Wai
IEEE ACCESS, 2022, 10 : 99038 - 99049
[42] CHANNEL ADAPTATION OF PLDA FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Chen, Liping
Lee, Kong Aik
Ma, Bin
Guo, Wu
Li, Haizhou
Dai, Li Rong
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5251 - 5255
[43] A text-independent speaker verification model: A comparative analysis
Charan, Rishi
Manisha, A.
Karthik, R.
Kumar, Rajesh M.
PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
[44] Text-Independent Speaker Verification Based on Triplet Loss
He, Junjie
He, Jing
Zhu, Liangjin
PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
[45] Residual Factor Analysis for Text-independent Speaker Verification
Zhu, Lei
Zheng, Rong
Xu, Bo
PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 964 - 968
[46] Pseudo speaker models for text-independent speaker verification using rank threshold
Chiba University, Chiba, Japan
NLP-KE - Proc. Int. Conf. Nat. Lang. Process. Knowl. Eng., (265-268):
[47] Score normalization for text-independent speaker verification systems
Auckenthaler, R
Carey, M
Lloyd-Thomas, H
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 42 - 54
[48] CNN WITH PHONETIC ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Zhou, Tianyan
Zhao, Yong
Li, Jinyu
Gong, Yifan
Wu, Jian
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 718 - 725
[49] Text-Independent Speaker Verification with Dual Attention Network
Li, Jingyu
Lee, Tan
INTERSPEECH 2020, 2020, : 956 - 960
[50] Influence of task duration in text-independent speaker verification
Fauve, Benoit
Evans, Nicholas
Pearson, Neil
Bonastre, Jean-Francois
Mason, John
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2728 - +

← 1 2 3 4 5 →