Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition

被引:18
|
作者
Cumani, Sandro [1 ]
Laface, Pietro [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10143 Turin, Italy
关键词
Density function transformation; i-vectors; probabilistic linear discriminant analysis; speaker recognition;
D O I
10.1109/TASLP.2017.2674966
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes to estimate parametric nonlinear transformations of i-vectors for speaker recognition systems based on probabilistic linear discriminant analysis (PLDA) classification. The Gaussian PLDA model assumes that the i-vectors are distributed according to the standard normal distribution. However, it has been shown that the i-vectors are better modeled, for example, by Heavy-Tailed distributions, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this paper, we propose to transform the i-vectors so that their distribution becomes more suitable to discriminate speakers using the PLDA model. This is performed by means of a sequence of affine and nonlinear transformations whose parameters are obtained by maximum likelihood estimation on the development set. Another contribution of this paper is the reduction of the mismatch between the development and evaluation i-vector length distributions by means of a scaling factor tuned for the estimated i-vector distribution, rather than by means of a blind length normalization. Relative improvement between 7% and 14% of the detection cost function was obtained with the proposed technique on the NIST SRE-2010 and SRE-2012 evaluation datasets, using both the traditional GMM/UBM and the hybrid DNN/GMM-based systems.
引用
下载
收藏
页码:908 / 919
页数:12
相关论文
共 50 条
  • [31] Speaker Recognition Based on i-Vector and Improved Local Preserving Projection
    Wu, Di
    PROCEEDINGS OF THE 2015 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2015, 336 : 115 - 121
  • [32] PLDA in i-vector based underwater acoustic signals classification
    Song, Yongqiang
    Liu, Feng
    Shen, Tongsheng
    SHIPS AND OFFSHORE STRUCTURES, 2024, 19 (03) : 366 - 374
  • [33] I-Vector Speaker and Language Recognition System on Android
    Vazquez-Machado, Christian
    Colon-Hernandez, Pedro
    Torres-Carrasquillo, Pedro A.
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [34] Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition
    Wang, Shuai
    Huang, Zili
    Qian, Yanmin
    Yu, Kai
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 195 - 199
  • [35] DURATION MISMATCH COMPENSATION FOR I-VECTOR BASED SPEAKER RECOGNITION SYSTEMS
    Hasan, Taufiq
    Saeidi, Rahim
    Hansen, John H. L.
    van Leeuwen, David A.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7663 - 7667
  • [36] Turkish Text-Dependent Speaker Verification using i-vector/PLDA Approach
    Hanilci, Cemal
    Celiktas, Havva
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [37] Generalizing I-Vector Estimation for Rapid Speaker Recognition
    Xu, Longting
    Lee, Kong Aik
    Li, Haizhou
    Yang, Zhen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (04) : 749 - 759
  • [38] PLDA-based Speaker Verification in Multi-Enrollment Scenario using Expected Vector Approach
    Soni, Meet
    Panda, Ashish
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [39] Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker Recognition
    Bahmaninezhad, Fahimeh
    Hansen, John H. L.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3643 - 3647
  • [40] FULL-COVARIANCE UBM AND HEAVY-TAILED PLDA IN I-VECTOR SPEAKER VERIFICATION
    Matejka, Pavel
    Glembek, Ondrej
    Castaldo, Fabio
    Alam, M. J.
    Plchot, Oldrich
    Kenny, Patrick
    Burget, Lukas
    Cernocky, Jan 'Honza'
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4828 - 4831