Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources

被引:50
|
作者
McLaren, Mitchell [1 ]
van Leeuwen, David [1 ]
机构
[1] Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6500 HC Nijmegen, Netherlands
关键词
Cross-channel source variation; i-vector; linear discriminant analysis (LDA); speaker recognition; total variability; COMPENSATION; VARIABILITY;
D O I
10.1109/TASL.2011.2164533
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recent development of the i-vector framework for speaker recognition has set a new performance standard in the research field. An i-vector is a compact representation of a speakers utterance extracted from a total variability subspace. Prior to classification using a cosine kernel, i-vectors are projected into an linear discriminant analysis (LDA) space in order to reduce inter-session variability and enhance speaker discrimination. The accurate estimation of this LDA space from a training dataset is crucial to detection performance. A typical training dataset, however, does not consist of utterances acquired through all sources of interest for each speaker. This has the effect of introducing systematic variation related to the speech source in the between-speaker covariance matrix and results in an incomplete representation of the within-speaker scatter matrix used for LDA. The recently proposed source-normalized (SN) LDA algorithm improves the robustness of i-vector-based speaker recognition under both mis-matched evaluation conditions and conditions for which inadequate speech resources are available for suitable system development. When evaluated on the recent NIST 2008 and 2010 Speaker Recognition Evaluations (SRE), SN-LDA demonstrated relative improvements of up to 38% in equal error rate (EER) and 44% in minimum DCF over LDA under mis-matched and sparsely resourced evaluation conditions while also providing improvements in the common telephone-only conditions. Extending on these initial developments, this study provides a thorough analysis of how SN-LDA transforms the i-vector space to reduce source variation and its robustness to varying evaluation and LDA training conditions. The concept of source-normalization is further extended to within-class covariance normalization (WCCN) and data-driven source detection.
引用
收藏
页码:755 / 766
页数:12
相关论文
共 29 条
  • [1] SOURCE-NORMALISED-AND-WEIGHTED LDA FOR ROBUST SPEAKER RECOGNITION USING I-VECTORS
    McLaren, Mitchell
    van Leeuwen, David
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5456 - 5459
  • [2] IMPROVED SPEAKER RECOGNITION WHEN USING I-VECTORS FROM MULTIPLE SPEECH SOURCES
    McLaren, Mitchell
    van Leeuwen, David
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5460 - 5463
  • [3] To Weight or not to Weight: Source-Normalised LDA for Speaker Recognition using i-vectors
    McLaren, Mitchell
    van Leeuwen, David
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2720 - 2723
  • [4] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [5] Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space
    Ben Kheder, Waad
    Matrouf, Driss
    Bousquet, Pierre-Michel
    Bonastre, Jean-Francois
    Ajili, Moez
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 97 - 107
  • [6] Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors
    Maghsoodi, Nooshin
    Sameti, Hossein
    Zeinal, Hossein
    Stafylakis, Themos
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1815 - 1825
  • [7] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
  • [8] Speaker recognition in duration-mismatched condition using bootstrapped i-vectors
    Ando, Atsushi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Aono, Yushi
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [9] Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition
    Principi, Emanuele
    Squartini, Stefano
    Piazza, Francesco
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3562 - 3568
  • [10] Age Estimation from Telephone Speech using i-vectors
    Bahari, Mohamad Hasan
    McLaren, Mitchell
    Van Hamme, Hugo
    Van Leeuwen, David
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 506 - 509