Speaker recognition in duration-mismatched condition using bootstrapped i-vectors

被引:0
|
作者
Ando, Atsushi [1 ]
Asami, Taichi [1 ]
Yamaguchi, Yoshikazu [1 ]
Aono, Yushi [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
关键词
D O I
10.1109/APSIPA.2016.7820803
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a novel speaker recognition framework that handles duration mismatch between registered and test utterances. The i-vectors extracted from short utterances exhibit high variance due to phoneme imbalance, which causes performance degradation in the duration mismatch condition. Most conventional methods attempt to decrease the variance by offsetting i-vectors or speaker similarity scores, however, the variances caused by duration differences are usually too complex to offset. Instead of conventional offsetting approaches, our proposed method, inspired by ensemble learning, attains low-variance results by generating multiple fixed-length short utterances from registered/test utterances and integrating their speaker similarities. Bootstrapped i-vectors are yielded from generated short utterances and the average PLDA scores between the combinations of registered and test bootstrapped i-vectors are used for speaker decision. Experiments show that the proposed method improves the equal-error-rate in trials of 60 second registered utterances and less than 5 second test utterances with relative error reduction of 73.9% - 90.6%. Moreover, it appears that the proposed method has smaller score variance than the baseline.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Duration compensation of i-vectors for short duration speaker verification
    Ma, Jianbo
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Lee, Kong Aik
    [J]. ELECTRONICS LETTERS, 2017, 53 (06) : 405 - 407
  • [2] Discriminative Scoring for Speaker Recognition Based on I-vectors
    Wang, Jun
    Wang, Dong
    Zhu, Ziwei
    Zheng, Thomas Fang
    Soong, Frank
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [3] Speaker age estimation using i-vectors
    Bahari, Mohamad Hasan
    McLaren, Mitchell
    Hugo Van Hamme
    van Leeuwen, David A.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 34 : 99 - 108
  • [4] Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
    Poddar, Arnab
    Sahidullah, Md
    Saha, Goutam
    [J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 298 - 303
  • [5] Speaker age classification and regression using i-vectors
    Grzybowska, Joanna
    Kacprzak, Stanislaw
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1402 - 1406
  • [6] IMPROVED SPEAKER RECOGNITION WHEN USING I-VECTORS FROM MULTIPLE SPEECH SOURCES
    McLaren, Mitchell
    van Leeuwen, David
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5460 - 5463
  • [7] SOURCE-NORMALISED-AND-WEIGHTED LDA FOR ROBUST SPEAKER RECOGNITION USING I-VECTORS
    McLaren, Mitchell
    van Leeuwen, David
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5456 - 5459
  • [8] Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space
    Ben Kheder, Waad
    Matrouf, Driss
    Bousquet, Pierre-Michel
    Bonastre, Jean-Francois
    Ajili, Moez
    [J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 97 - 107
  • [9] Accounting For Uncertainty of i-vectors in Speaker Recognition Using Uncertainty Propagation and Modified Imputation
    Saeidi, Rahim
    Alku, Paavo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3546 - 3550
  • [10] Probabilistic approach using joint clean and noisy i-vectors modeling for speaker recognition
    Ben Kheder, Waad
    Matrouf, Driss
    Ajili, Moez
    Bonastre, Jean-Francois
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3638 - 3642