Tied Variational Autoencoder Backends for i-Vector Speaker Recognition

被引:17
|
作者
Villalba, Jesus [1 ]
Brummer, Niko [2 ]
Dehak, Najim [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
[2] Nuance Commun Inc, Pretoria, South Africa
关键词
speaker recognition; i-vectors; variational autoencoders; stochastic variational inference; PLDA;
D O I
10.21437/Interspeech.2017-1018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic linear discriminant analysis (PLDA) is the de facto standard for backends in i-vector speaker recognition. If we try to extend the PLDA paradigm using non-linear models, e.g., deep neural networks, the posterior distributions of the latent variables and the marginal likelihood become intractable. In this paper, we propose to approach this problem using stochastic gradient variational Bayes. We generalize the PLDA model to let i-vectors depend non-linearly on the latent factors. We approximate the evidence lower bound (ELBO) by Monte Carlo sampling using the reparametrization trick. This enables us to optimize of the ELBO using backpropagation to jointly estimate the parameters that define the model and the approximate posteriors of the latent factors. We also present a reformulation of the likelihood ratio, which we call Q-scoring. Q-scoring makes possible to efficiently score the speaker verification trials for this model. Experimental results on NIST SRE10 suggest that more data might be required to exploit the potential of this method.
引用
收藏
页码:1004 / 1008
页数:5
相关论文
共 50 条
  • [1] I-vector Based Speaker Gender Recognition
    Wang, Minghe
    Chen, Ying
    Tang, Zhenmin
    Zhang, Erhua
    [J]. 2015 IEEE ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2015, : 729 - 732
  • [2] i-vector Based Speaker Recognition on Short Utterances
    Kanagasundaram, Ahilan
    Vogt, Robbie
    Dean, David
    Sridharan, Sridha
    Mason, Michael
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +
  • [3] I-Vector Speaker and Language Recognition System on Android
    Vazquez-Machado, Christian
    Colon-Hernandez, Pedro
    Torres-Carrasquillo, Pedro A.
    [J]. 2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [4] Generalizing I-Vector Estimation for Rapid Speaker Recognition
    Xu, Longting
    Lee, Kong Aik
    Li, Haizhou
    Yang, Zhen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 749 - 759
  • [5] I-vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-robust Speaker Recognition
    Mahto, Shivangi
    Yamamoto, Hitoshi
    Koshinaka, Takafumi
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3722 - 3726
  • [6] A Comparison of Covariance Matrix and i-vector Based Speaker Recognition
    Jakovljevic, Niksa
    Jokic, Ivan
    Josic, Slobodan
    Delic, Vlado
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 37 - 45
  • [7] GENDER INDEPENDENT DISCRIMINATIVE SPEAKER RECOGNITION IN I-VECTOR SPACE
    Cumani, Sandro
    Glembek, Ondrej
    Bruemmer, Niko
    de Villiers, Edward
    Laface, Pietro
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4361 - 4364
  • [8] ADDITIVE NOISE COMPENSATION IN THE I-VECTOR SPACE FOR SPEAKER RECOGNITION
    Ben Kheder, Waad
    Matrouf, Driss
    Bonastre, Jean-Francois
    Ajili, Moez
    Bousquet, Pierre-Michel
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4190 - 4194
  • [9] I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction
    Ibrahim, Noor Salwani
    Ramli, Dzati Athiar
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 1534 - 1540
  • [10] DEEP BELIEF NETWORKS FOR I-VECTOR BASED SPEAKER RECOGNITION
    Ghahabi, Omid
    Hernando, Javier
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,