Deep Neural Network Embeddings for Text-Independent Speaker Verification

被引:551
|
作者
Snyder, David [1 ]
Garcia-Romero, Daniel
Povey, Daniel
Khudanpur, Sanjeev
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
speaker recognition; speaker verification; deep neural networks;
D O I
10.21437/Interspeech.2017-620
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feed forward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variable length speech segments. After training, utterances are mapped directly to fixed-dimensional speaker embeddings and pairs of embeddings are scored using a PLDA-based backend. We compare performance with a traditional i-vector baseline on NIST SRE 2010 and 2016. We find that the embeddings outperform i-vectors for short speech segments and are competitive on long duration test conditions. Moreover. the two representations are complementary. and their fusion improves on the baseline at all operating points. Similar systems have recently shown promising results when trained on very large proprietary datasets, but to the best of our knowledge, these arc the best results reported for speaker-discriminative neural networks when trained and tested on publicly available corpora.
引用
收藏
页码:999 / 1003
页数:5
相关论文
共 50 条
  • [1] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    [J]. INTERSPEECH 2019, 2019, : 1168 - 1172
  • [2] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [3] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [4] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [5] Group-based speaker embeddings for text-independent speaker verification
    Jung, Youngmoon
    Eom, Youngsik
    Lee, Yeonghyeon
    Kim, Hoirin
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
  • [6] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Snyder, David
    Mak, Brian
    Povey, Daniel
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
  • [7] OPTIMIZING NEURAL NETWORK EMBEDDINGS USING A PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Dhamyal, Hira
    Zhou, Tianyan
    Raj, Bhiksha
    Singh, Rita
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 742 - 748
  • [8] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [9] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [10] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23