Deep Neural Network Embeddings for Text-Independent Speaker Verification

被引:551
|
作者
Snyder, David [1 ]
Garcia-Romero, Daniel
Povey, Daniel
Khudanpur, Sanjeev
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
speaker recognition; speaker verification; deep neural networks;
D O I
10.21437/Interspeech.2017-620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feed forward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variable length speech segments. After training, utterances are mapped directly to fixed-dimensional speaker embeddings and pairs of embeddings are scored using a PLDA-based backend. We compare performance with a traditional i-vector baseline on NIST SRE 2010 and 2016. We find that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions. Moreover. the two representations are complementary. and their fusion improves on the baseline at all operating points. Similar systems have recently shown promising results when trained on very large proprietary datasets, but to the best of our knowledge, these arc the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.
引用
收藏
页码:999 / 1003
页数:5
相关论文
共 50 条
  • [31] Wavelet entropy and neural network for text-independent speaker identification
    Daqrouq, Khaled
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (05) : 796 - 802
  • [32] A ROBUST TEXT-INDEPENDENT SPEAKER VERIFICATION METHOD BASED ON SPEECH SEPARATION AND DEEP SPEAKER
    Zhao, Fei
    Li, Hao
    Zhang, Xueliang
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6101 - 6105
  • [33] Text-independent speaker verification in embedded environments
    Tydlitat, Borivoj
    Navratil, Jiri
    Pelecanos, Jason W.
    Ramaswamy, Ganesh N.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 293 - +
  • [34] Collaborative and adversarial network for text-independent speaker verification in domain adaptation
    Qiang, Junhao
    Yang, Qun
    Gao, Jie
    Liu, Shaohan
    [J]. ELECTRONICS LETTERS, 2023, 59 (02)
  • [35] Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    Kim, Seong-Hu
    Park, Yong-Hwa
    [J]. INTERSPEECH 2021, 2021, : 66 - 70
  • [36] A Survey on Text-Dependent and Text-Independent Speaker Verification
    Tu, Youzhi
    Lin, Weiwei
    Mak, Man-Wai
    [J]. IEEE ACCESS, 2022, 10 : 99038 - 99049
  • [37] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Tang, Yun
    Ding, Guohong
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
  • [38] Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification
    Qu, Xiaoyang
    Wang, Jianzong
    Xiao, Jing
    [J]. INTERSPEECH 2020, 2020, : 961 - 965
  • [39] Generalized locally recurrent probabilistic neural networks for text-independent speaker verification
    Ganchev, T
    Fakotakis, N
    Tasoulis, DK
    Vrahatis, MN
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 41 - 44
  • [40] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    [J]. 2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,