Speaker recognition in duration-mismatched condition using bootstrapped i-vectors

被引：0

作者：

Ando, Atsushi ^{[1
]}

Asami, Taichi ^{[1
]}

Yamaguchi, Yoshikazu ^{[1
]}

Aono, Yushi ^{[1
]}

机构：

[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan

来源：

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2016年

关键词：

D O I：

10.1109/APSIPA.2016.7820803

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents a novel speaker recognition framework that handles duration mismatch between registered and test utterances. The i-vectors extracted from short utterances exhibit high variance due to phoneme imbalance, which causes performance degradation in the duration mismatch condition. Most conventional methods attempt to decrease the variance by offsetting i-vectors or speaker similarity scores, however, the variances caused by duration differences are usually too complex to offset. Instead of conventional offsetting approaches, our proposed method, inspired by ensemble learning, attains low-variance results by generating multiple fixed-length short utterances from registered/test utterances and integrating their speaker similarities. Bootstrapped i-vectors are yielded from generated short utterances and the average PLDA scores between the combinations of registered and test bootstrapped i-vectors are used for speaker decision. Experiments show that the proposed method improves the equal-error-rate in trials of 60 second registered utterances and less than 5 second test utterances with relative error reduction of 73.9% - 90.6%. Moreover, it appears that the proposed method has smaller score variance than the baseline.

引用

页数：4

共 50 条

[1] Duration compensation of i-vectors for short duration speaker verification
Ma, Jianbo
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
Lee, Kong Aik
[J]. ELECTRONICS LETTERS, 2017, 53 (06) : 405 - 407
[2] Discriminative Scoring for Speaker Recognition Based on I-vectors
Wang, Jun
Wang, Dong
Zhu, Ziwei
Zheng, Thomas Fang
Soong, Frank
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[3] Speaker age estimation using i-vectors
Bahari, Mohamad Hasan
McLaren, Mitchell
Hugo Van Hamme
van Leeuwen, David A.
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 34 : 99 - 108
[4] Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
Poddar, Arnab
Sahidullah, Md
Saha, Goutam
[J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 298 - 303
[5] Speaker age classification and regression using i-vectors
Grzybowska, Joanna
Kacprzak, Stanislaw
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1402 - 1406
[6] IMPROVED SPEAKER RECOGNITION WHEN USING I-VECTORS FROM MULTIPLE SPEECH SOURCES
McLaren, Mitchell
van Leeuwen, David
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5460 - 5463
[7] SOURCE-NORMALISED-AND-WEIGHTED LDA FOR ROBUST SPEAKER RECOGNITION USING I-VECTORS
McLaren, Mitchell
van Leeuwen, David
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5456 - 5459
[8] Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space
Ben Kheder, Waad
Matrouf, Driss
Bousquet, Pierre-Michel
Bonastre, Jean-Francois
Ajili, Moez
[J]. STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 97 - 107
[9] Accounting For Uncertainty of i-vectors in Speaker Recognition Using Uncertainty Propagation and Modified Imputation
Saeidi, Rahim
Alku, Paavo
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3546 - 3550
[10] Probabilistic approach using joint clean and noisy i-vectors modeling for speaker recognition
Ben Kheder, Waad
Matrouf, Driss
Ajili, Moez
Bonastre, Jean-Francois
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3638 - 3642

← 1 2 3 4 5 →