The Opensesame NIST 2016 Speaker Recognition Evaluation System

被引：3

作者：

Liu, Gang ^{[1
]}

Qian, Qi ^{[1
]}

Wang, Zhibin ^{[1
]}

Zhao, Qingen ^{[1
]}

Wang, Tianzhou ^{[1
]}

Li, Hao ^{[1
]}

Xue, Jian ^{[1
]}

Zhu, Shenghuo ^{[1
]}

Jin, Rong ^{[1
]}

Zhao, Tuo ^{[1
,2
]}

机构：

[1] Alibaba Grp US Inc, Hangzhou, Zhejiang, Peoples R China

[2] Univ Missouri, Columbia, MO 65211 USA

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

symmetric SVM; distance metric learning; SRE2016; language mismatch; speaker recognition; MULTI-SESSION; BACK-END;

D O I：

10.21437/Interspeech.2017-997

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Last two decades have witnessed a significant progress in speaker recognition, as evidenced by the improving performance in the speaker recognition evaluations (SRE) hosted by NIST. Despite the progress, only a few research is focused on speaker recognition with short duration and language mismatch condition, which often leads to poor recognition performance. In NIST SRE2016, these concerns were first systematically investigated by the speaker recognition community. In this study, we address these challenges from the viewpoint of feature extraction and modeling. In particular, we improve the robustness of features by combining GMM and DNN based iVector extraction approaches, and improve the reliability of the back-end model by exploiting symmetric SVM that can effectively leverage the unlabeled data. Finally, we introduce distance metric learning to improve the generalization capacity of the development data that is usually in limited size. Then a fusion strategy is adopted to collectively boost the performance. The effectiveness of the proposed scheme for speaker recognition is demonstrated on SRE2016 evaluation data: compared with DNN-iVector PLDA baseline system, our method yields 25.6% relative improvement in terms of min_Cprimary.

引用

页码：2854 / 2858

页数：5

共 50 条

[21] Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
Sun, Hanwu
Nwe, Tin Lay
Chin, Eugene
Koh, Wei
Bin, Ma
Li, Haizhou
MULTIMEDIA SYSTEMS AND APPLICATIONS X, 2007, 6777
[22] THU-EE System Fusion for the NIST 2012 Speaker Recognition Evaluation
Zhang, Wei-Qiang
Li, Zhi-Yi
Liu, Weiwei
Liu, Jia
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2473 - 2477
[23] CRSS SYSTEMS FOR 2012 NIST SPEAKER RECOGNITION EVALUATION
Hasan, Taufiq
Sadjadi, Seyed Omid
Liu, Gang
Shokouhi, Navid
Boril, Hynek
Hansen, John H. L.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6783 - 6787
[24] The NIST SRE Summed Channel Speaker Recognition System
Sun, Hanwu
Ma, Bin
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1111 - 1114
[25] Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006
Bruemmer, Niko
Burget, Lukas
Cernocky, Jan 'Honza'
Glembek, Ondrej
Grezl, Frantisek
Karafiat, Martin
van Leeuwen, David A.
Matejka, Pavel
Schwarz, Petr
Strasheim, Albert
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2072 - 2084
[26] Rapid channel compensation for speaker verification in the NIST 2000 speaker recognition evaluation
Pelecanos, J.
Sridharan, S.
Acoustics Australia, 2001, 29 (01) : 17 - 20
[27] LOQUENDO - POLITECNICO DI TORINO'S 2010 NIST SPEAKER RECOGNITION EVALUATION SYSTEM
Castaldo, Fabio
Colibro, Daniele
Vair, Claudio
Cumani, Sandro
Laface, Pietro
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5464 - 5467
[28] Nuance - Politecnico di Torino's 2012 NIST Speaker Recognition Evaluation System
Colibro, Daniele
Vair, Claudio
Farrell, Kevin
Krause, Nir
Karvitsky, Gennady
Cumani, Sandro
Laface, Pietro
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1995 - 1999
[29] Performance Factor Analysis for the 2012 NIST Speaker Recognition Evaluation
Martin, Alvin F.
Greenberg, Craig S.
Stanford, Vincent M.
Howard, John M.
Doddington, George R.
Godfrey, John J.
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1135 - 1138
[30] Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation
Greenberg, Craig S.
Martin, Alvin F.
Barr, Bradford N.
Doddington, George R.
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 268 - +

← 1 2 3 4 5 →