The Opensesame NIST 2016 Speaker Recognition Evaluation System

被引:3
|
作者
Liu, Gang [1 ]
Qian, Qi [1 ]
Wang, Zhibin [1 ]
Zhao, Qingen [1 ]
Wang, Tianzhou [1 ]
Li, Hao [1 ]
Xue, Jian [1 ]
Zhu, Shenghuo [1 ]
Jin, Rong [1 ]
Zhao, Tuo [1 ,2 ]
机构
[1] Alibaba Grp US Inc, Hangzhou, Zhejiang, Peoples R China
[2] Univ Missouri, Columbia, MO 65211 USA
关键词
symmetric SVM; distance metric learning; SRE2016; language mismatch; speaker recognition; MULTI-SESSION; BACK-END;
D O I
10.21437/Interspeech.2017-997
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Last two decades have witnessed a significant progress in speaker recognition, as evidenced by the improving performance in the speaker recognition evaluations (SRE) hosted by NIST. Despite the progress, only a few research is focused on speaker recognition with short duration and language mismatch condition, which often leads to poor recognition performance. In NIST SRE2016, these concerns were first systematically investigated by the speaker recognition community. In this study, we address these challenges from the viewpoint of feature extraction and modeling. In particular, we improve the robustness of features by combining GMM and DNN based iVector extraction approaches, and improve the reliability of the back-end model by exploiting symmetric SVM that can effectively leverage the unlabeled data. Finally, we introduce distance metric learning to improve the generalization capacity of the development data that is usually in limited size. Then a fusion strategy is adopted to collectively boost the performance. The effectiveness of the proposed scheme for speaker recognition is demonstrated on SRE2016 evaluation data: compared with DNN-iVector PLDA baseline system, our method yields 25.6% relative improvement in terms of min_Cprimary.
引用
收藏
页码:2854 / 2858
页数:5
相关论文
共 50 条
  • [21] Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation
    Sun, Hanwu
    Nwe, Tin Lay
    Chin, Eugene
    Koh, Wei
    Bin, Ma
    Li, Haizhou
    MULTIMEDIA SYSTEMS AND APPLICATIONS X, 2007, 6777
  • [22] THU-EE System Fusion for the NIST 2012 Speaker Recognition Evaluation
    Zhang, Wei-Qiang
    Li, Zhi-Yi
    Liu, Weiwei
    Liu, Jia
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2473 - 2477
  • [23] CRSS SYSTEMS FOR 2012 NIST SPEAKER RECOGNITION EVALUATION
    Hasan, Taufiq
    Sadjadi, Seyed Omid
    Liu, Gang
    Shokouhi, Navid
    Boril, Hynek
    Hansen, John H. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6783 - 6787
  • [24] The NIST SRE Summed Channel Speaker Recognition System
    Sun, Hanwu
    Ma, Bin
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1111 - 1114
  • [25] Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006
    Bruemmer, Niko
    Burget, Lukas
    Cernocky, Jan 'Honza'
    Glembek, Ondrej
    Grezl, Frantisek
    Karafiat, Martin
    van Leeuwen, David A.
    Matejka, Pavel
    Schwarz, Petr
    Strasheim, Albert
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2072 - 2084
  • [26] Rapid channel compensation for speaker verification in the NIST 2000 speaker recognition evaluation
    Pelecanos, J.
    Sridharan, S.
    Acoustics Australia, 2001, 29 (01) : 17 - 20
  • [27] LOQUENDO - POLITECNICO DI TORINO'S 2010 NIST SPEAKER RECOGNITION EVALUATION SYSTEM
    Castaldo, Fabio
    Colibro, Daniele
    Vair, Claudio
    Cumani, Sandro
    Laface, Pietro
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5464 - 5467
  • [28] Nuance - Politecnico di Torino's 2012 NIST Speaker Recognition Evaluation System
    Colibro, Daniele
    Vair, Claudio
    Farrell, Kevin
    Krause, Nir
    Karvitsky, Gennady
    Cumani, Sandro
    Laface, Pietro
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1995 - 1999
  • [29] Performance Factor Analysis for the 2012 NIST Speaker Recognition Evaluation
    Martin, Alvin F.
    Greenberg, Craig S.
    Stanford, Vincent M.
    Howard, John M.
    Doddington, George R.
    Godfrey, John J.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1135 - 1138
  • [30] Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation
    Greenberg, Craig S.
    Martin, Alvin F.
    Barr, Bradford N.
    Doddington, George R.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 268 - +