Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition

被引:18
|
作者
Cumani, Sandro [1 ]
Laface, Pietro [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, I-10143 Turin, Italy
关键词
Density function transformation; i-vectors; probabilistic linear discriminant analysis; speaker recognition;
D O I
10.1109/TASLP.2017.2674966
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes to estimate parametric nonlinear transformations of i-vectors for speaker recognition systems based on probabilistic linear discriminant analysis (PLDA) classification. The Gaussian PLDA model assumes that the i-vectors are distributed according to the standard normal distribution. However, it has been shown that the i-vectors are better modeled, for example, by Heavy-Tailed distributions, and that significant improvement of the classification performance can be obtained by whitening and length normalizing the i-vectors. In this paper, we propose to transform the i-vectors so that their distribution becomes more suitable to discriminate speakers using the PLDA model. This is performed by means of a sequence of affine and nonlinear transformations whose parameters are obtained by maximum likelihood estimation on the development set. Another contribution of this paper is the reduction of the mismatch between the development and evaluation i-vector length distributions by means of a scaling factor tuned for the estimated i-vector distribution, rather than by means of a blind length normalization. Relative improvement between 7% and 14% of the detection cost function was obtained with the proposed technique on the NIST SRE-2010 and SRE-2012 evaluation datasets, using both the traditional GMM/UBM and the hybrid DNN/GMM-based systems.
引用
下载
收藏
页码:908 / 919
页数:12
相关论文
共 50 条
  • [21] DEEP BELIEF NETWORKS FOR I-VECTOR BASED SPEAKER RECOGNITION
    Ghahabi, Omid
    Hernando, Javier
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [22] Emotional Speaker Recognition Based on i-vector Space Model
    Mansour, Asma
    Chenchah, Farah
    Lachiri, Zied
    2016 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2016,
  • [23] I-VECTOR KULLBACK-LEIBLER DIVISIVE NORMALIZATION FOR PLDA SPEAKER VERIFICATION
    Pan, Yilin
    Zheng, Tieran
    Chen, Chen
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 56 - 60
  • [24] Discriminatively learned network for i-vector based speaker recognition
    Yao, Shengyu
    Zhou, Ruohua
    Zhang, Pengyuan
    Yan, Yonghong
    ELECTRONICS LETTERS, 2018, 54 (22) : 1302 - 1303
  • [25] Clustering-Based I-Vector Formulation for Speaker Recognition
    Lee, Hung-Shin
    Tsao, Yu
    Wang, Hsin-Min
    Jeng, Shyh-Kang
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1101 - 1105
  • [26] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
    Garcia-Romero, Daniel
    McCree, Alan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [27] A TRANSFER LEARNING METHOD FOR PLDA-BASED SPEAKER VERIFICATION
    Hong, Qingyang
    Zhang, Jun
    Li, Lin
    Wan, Lihong
    Tong, Feng
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5455 - 5459
  • [28] PLDA-based Clustering for Speaker Diarization of Broadcast Streams
    Silovsky, Jan
    Prazak, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2920 - +
  • [29] DEEP NEURAL NETWORK DRIVEN MIXTURE OF PLDA FOR ROBUST I-VECTOR SPEAKER VERIFICATION
    Li, Na
    Mak, Man-Wai
    Chien, Jen-Tzung
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 186 - 191
  • [30] Full multicondition training for robust i-vector based speaker recognition
    Ribas, Dayana
    Vincent, Emmanuel
    Ramon Calvo, Jose
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1057 - 1061