Tied Variational Autoencoder Backends for i-Vector Speaker Recognition

被引：17

作者：

Villalba, Jesus ^{[1
]}

Brummer, Niko ^{[2
]}

Dehak, Najim ^{[1
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA

[2] Nuance Commun Inc, Pretoria, South Africa

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

speaker recognition; i-vectors; variational autoencoders; stochastic variational inference; PLDA;

D O I：

10.21437/Interspeech.2017-1018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Probabilistic linear discriminant analysis (PLDA) is the de facto standard for backends in i-vector speaker recognition. If we try to extend the PLDA paradigm using non-linear models, e.g., deep neural networks, the posterior distributions of the latent variables and the marginal likelihood become intractable. In this paper, we propose to approach this problem using stochastic gradient variational Bayes. We generalize the PLDA model to let i-vectors depend non-linearly on the latent factors. We approximate the evidence lower bound (ELBO) by Monte Carlo sampling using the reparametrization trick. This enables us to optimize of the ELBO using backpropagation to jointly estimate the parameters that define the model and the approximate posteriors of the latent factors. We also present a reformulation of the likelihood ratio, which we call Q-scoring. Q-scoring makes possible to efficiently score the speaker verification trials for this model. Experimental results on NIST SRE10 suggest that more data might be required to exploit the potential of this method.

引用

页码：1004 / 1008

页数：5

共 50 条

[1] I-vector Based Speaker Gender Recognition
Wang, Minghe
Chen, Ying
Tang, Zhenmin
Zhang, Erhua
[J]. 2015 IEEE ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2015, : 729 - 732
[2] i-vector Based Speaker Recognition on Short Utterances
Kanagasundaram, Ahilan
Vogt, Robbie
Dean, David
Sridharan, Sridha
Mason, Michael
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2352 - +
[3] I-Vector Speaker and Language Recognition System on Android
Vazquez-Machado, Christian
Colon-Hernandez, Pedro
Torres-Carrasquillo, Pedro A.
[J]. 2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
[4] Generalizing I-Vector Estimation for Rapid Speaker Recognition
Xu, Longting
Lee, Kong Aik
Li, Haizhou
Yang, Zhen
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 749 - 759
[5] I-vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-robust Speaker Recognition
Mahto, Shivangi
Yamamoto, Hitoshi
Koshinaka, Takafumi
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3722 - 3726
[6] A Comparison of Covariance Matrix and i-vector Based Speaker Recognition
Jakovljevic, Niksa
Jokic, Ivan
Josic, Slobodan
Delic, Vlado
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 37 - 45
[7] GENDER INDEPENDENT DISCRIMINATIVE SPEAKER RECOGNITION IN I-VECTOR SPACE
Cumani, Sandro
Glembek, Ondrej
Bruemmer, Niko
de Villiers, Edward
Laface, Pietro
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4361 - 4364
[8] ADDITIVE NOISE COMPENSATION IN THE I-VECTOR SPACE FOR SPEAKER RECOGNITION
Ben Kheder, Waad
Matrouf, Driss
Bonastre, Jean-Francois
Ajili, Moez
Bousquet, Pierre-Michel
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4190 - 4194
[9] I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction
Ibrahim, Noor Salwani
Ramli, Dzati Athiar
[J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 1534 - 1540
[10] DEEP BELIEF NETWORKS FOR I-VECTOR BASED SPEAKER RECOGNITION
Ghahabi, Omid
Hernando, Javier
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →