Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification

被引：171

作者：

Zhu, Yingke ^{[1
]}

Ko, Tom ^{[2
]}

Snyder, David ^{[3
,4
]}

Mak, Brian ^{[1
]}

Povey, Daniel ^{[3
,4
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Huawei Noahs Ark Res Lab, Hong Kong, Peoples R China

[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[4] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

speaker recognition; deep neural networks; self-attention; x-vectors;

D O I：

10.21437/Interspeech.2018-1158

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a new method to extract speaker embeddings from a deep neural network (DNN) for text-independent speaker verification. Usually, speaker embeddings are extracted from a speaker-classification DNN that averages the hidden vectors over the frames of a speaker; the hidden vectors produced from all the frames are assumed to be equally important. We relax this assumption and compute the speaker embedding as a weighted average of a speaker's frame-level hidden vectors, and their weights are automatically determined by a self-attention mechanism. The effect of multiple attention heads are also investigated to capture different aspects of a speaker's input speech. Finally, a PLDA classifier is used to compare pairs of embeddings. The proposed self-attentive speaker embedding system is compared with a strong DNN embedding baseline on NIST SRE 2016. We find that the self-attentive embeddings achieve superior performance. Moreover, the improvement produced by the self-attentive speaker embeddings is consistent with both short and long testing utterances.

引用

页码：3573 / 3577

页数：5

共 50 条

[1] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Zhu, Yingke
Mak, Brian
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
[2] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Gupta, Vishwa
Kenny, Patrick
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
[3] Group-based speaker embeddings for text-independent speaker verification
Jung, Youngmoon
Eom, Youngsik
Lee, Yeonghyeon
Kim, Hoirin
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
[4] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Shim, Hye-Jin
Heo, Jungwoo
Park, Jae-Han
Lee, Ga-Hui
Yu, Ha-Jin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
[5] Deep Neural Network Embeddings for Text-Independent Speaker Verification
Snyder, David
Garcia-Romero, Daniel
Povey, Daniel
Khudanpur, Sanjeev
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
[6] Vector-Based Attentive Pooling for Text-Independent Speaker Verification
Wu, Yanfeng
Guo, Chenkai
Gao, Hongcan
Hou, Xiaolei
Xu, Jing
[J]. INTERSPEECH 2020, 2020, : 936 - 940
[7] Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System
Seo, Soonshin
Kim, Ji-Hwan
[J]. ELECTRONICS, 2020, 9 (10) : 1 - 14
[8] A tutorial on text-independent speaker verification
Bimbot, F
Bonastre, JF
Fredouille, C
Gravier, G
Magrin-Chagnolleau, I
Meignier, S
Merlin, T
Ortega-García, J
Petrovska-Delacrétaz, D
Reynolds, DA
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
[9] A Tutorial on Text-Independent Speaker Verification
Frédéric Bimbot
Jean-François Bonastre
Corinne Fredouille
Guillaume Gravier
Ivan Magrin-Chagnolleau
Sylvain Meignier
Teva Merlin
Javier Ortega-García
Dijana Petrovska-Delacrétaz
Douglas A. Reynolds
[J]. EURASIP Journal on Advances in Signal Processing, 2004
[10] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546

← 1 2 3 4 5 →