Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification

被引:4
|
作者
Li, Na [1 ]
Mak, Man-Wai [1 ]
Lin, Wei-Wei [1 ]
Chien, Jen-Tzung [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Hong Kong, Peoples R China
[2] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
来源
关键词
Speaker verification; Duration variation; SNR mismatch; Variational Bayes; I-vector; PLDA;
D O I
10.1016/j.csl.2017.04.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within. a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in MST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:83 / 103
页数:21
相关论文
共 50 条
  • [1] SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification
    Li, Na
    Mak, Man-Wai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1648 - 1659
  • [2] Deep Discriminative Embeddings for Duration Robust Speaker Verification
    Li, Na
    Tuo, Deyi
    Su, Dan
    Li, Zhifeng
    Yu, Dong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
  • [3] Robust speaker verification with state duration modeling
    Yoma, NB
    Pegoraro, TF
    [J]. SPEECH COMMUNICATION, 2002, 38 (1-2) : 77 - 88
  • [4] SNR-Invariant PLDA Modeling for Robust Speaker Verification
    Li, Na
    Mak, Man-Wai
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2317 - 2321
  • [5] Robust discriminative feature subspace analysis for kinship verification
    Goyal, Aarti
    Meenpal, Toshanlal
    [J]. INFORMATION SCIENCES, 2021, 578 : 507 - 524
  • [6] PLDA Modeling in the Fishervoice Subspace for Speaker Verification
    Zhong, Jinghua
    Jiang, Weiwu
    Rao, Wei
    Mak, Man-Wai
    Meng, Helen
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1130 - 1134
  • [7] Mismatch modeling and compensation for robust speaker verification
    Lei, Yun
    Hansen, John H. L.
    [J]. SPEECH COMMUNICATION, 2011, 53 (02) : 257 - 268
  • [8] Discriminative Adaptation for Speaker Verification
    Longworth, C.
    Gales, M. J. F.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1467 - 1470
  • [9] DISCRIMINATIVE AUTOENCODERS FOR SPEAKER VERIFICATION
    Lee, Hung-Shin
    Lu, Yu-Ding
    Hsu, Chin-Cheng
    Tsao, Yu
    Wang, Hsin-Min
    Leng, Shyh-Kang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5375 - 5379
  • [10] Discriminative adaptation for speaker verification
    Korkmazskiy, F
    Juang, BH
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1744 - 1747