Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification

被引:171
|
作者
Zhu, Yingke [1 ]
Ko, Tom [2 ]
Snyder, David [3 ,4 ]
Mak, Brian [1 ]
Povey, Daniel [3 ,4 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Huawei Noahs Ark Res Lab, Hong Kong, Peoples R China
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[4] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
speaker recognition; deep neural networks; self-attention; x-vectors;
D O I
10.21437/Interspeech.2018-1158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new method to extract speaker embeddings from a deep neural network (DNN) for text-independent speaker verification. Usually, speaker embeddings are extracted from a speaker-classification DNN that averages the hidden vectors over the frames of a speaker; the hidden vectors produced from all the frames are assumed to be equally important. We relax this assumption and compute the speaker embedding as a weighted average of a speaker's frame-level hidden vectors, and their weights are automatically determined by a self-attention mechanism. The effect of multiple attention heads are also investigated to capture different aspects of a speaker's input speech. Finally, a PLDA classifier is used to compare pairs of embeddings. The proposed self-attentive speaker embedding system is compared with a strong DNN embedding baseline on NIST SRE 2016. We find that the self-attentive embeddings achieve superior performance. Moreover, the improvement produced by the self-attentive speaker embeddings is consistent with both short and long testing utterances.
引用
收藏
页码:3573 / 3577
页数:5
相关论文
共 50 条
  • [1] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [2] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [3] Group-based speaker embeddings for text-independent speaker verification
    Jung, Youngmoon
    Eom, Youngsik
    Lee, Yeonghyeon
    Kim, Hoirin
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
  • [4] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Shim, Hye-Jin
    Heo, Jungwoo
    Park, Jae-Han
    Lee, Ga-Hui
    Yu, Ha-Jin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
  • [5] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [6] Vector-Based Attentive Pooling for Text-Independent Speaker Verification
    Wu, Yanfeng
    Guo, Chenkai
    Gao, Hongcan
    Hou, Xiaolei
    Xu, Jing
    [J]. INTERSPEECH 2020, 2020, : 936 - 940
  • [7] Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System
    Seo, Soonshin
    Kim, Ji-Hwan
    [J]. ELECTRONICS, 2020, 9 (10) : 1 - 14
  • [8] A tutorial on text-independent speaker verification
    Bimbot, F
    Bonastre, JF
    Fredouille, C
    Gravier, G
    Magrin-Chagnolleau, I
    Meignier, S
    Merlin, T
    Ortega-García, J
    Petrovska-Delacrétaz, D
    Reynolds, DA
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 430 - 451
  • [9] A Tutorial on Text-Independent Speaker Verification
    Frédéric Bimbot
    Jean-François Bonastre
    Corinne Fredouille
    Guillaume Gravier
    Ivan Magrin-Chagnolleau
    Sylvain Meignier
    Teva Merlin
    Javier Ortega-García
    Dijana Petrovska-Delacrétaz
    Douglas A. Reynolds
    [J]. EURASIP Journal on Advances in Signal Processing, 2004
  • [10] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546