The Opensesame NIST 2016 Speaker Recognition Evaluation System

被引:3
|
作者
Liu, Gang [1 ]
Qian, Qi [1 ]
Wang, Zhibin [1 ]
Zhao, Qingen [1 ]
Wang, Tianzhou [1 ]
Li, Hao [1 ]
Xue, Jian [1 ]
Zhu, Shenghuo [1 ]
Jin, Rong [1 ]
Zhao, Tuo [1 ,2 ]
机构
[1] Alibaba Grp US Inc, Hangzhou, Zhejiang, Peoples R China
[2] Univ Missouri, Columbia, MO 65211 USA
关键词
symmetric SVM; distance metric learning; SRE2016; language mismatch; speaker recognition; MULTI-SESSION; BACK-END;
D O I
10.21437/Interspeech.2017-997
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Last two decades have witnessed a significant progress in speaker recognition, as evidenced by the improving performance in the speaker recognition evaluations (SRE) hosted by NIST. Despite the progress, only a few research is focused on speaker recognition with short duration and language mismatch condition, which often leads to poor recognition performance. In NIST SRE2016, these concerns were first systematically investigated by the speaker recognition community. In this study, we address these challenges from the viewpoint of feature extraction and modeling. In particular, we improve the robustness of features by combining GMM and DNN based iVector extraction approaches, and improve the reliability of the back-end model by exploiting symmetric SVM that can effectively leverage the unlabeled data. Finally, we introduce distance metric learning to improve the generalization capacity of the development data that is usually in limited size. Then a fusion strategy is adopted to collectively boost the performance. The effectiveness of the proposed scheme for speaker recognition is demonstrated on SRE2016 evaluation data: compared with DNN-iVector PLDA baseline system, our method yields 25.6% relative improvement in terms of min_Cprimary.
引用
收藏
页码:2854 / 2858
页数:5
相关论文
共 50 条
  • [1] The 2016 NIST Speaker Recognition Evaluation
    Sadjadi, Seyed Omid
    Kheyrkhah, Timothee
    Tong, Audrey
    Greenberg, Craig
    Reynolds, Douglas
    Singer, Elliot
    Mason, Lisa
    Hernandez-Cordero, Jaime
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1353 - 1357
  • [2] BUT system for NIST 2008 speaker recognition evaluation
    Burge, Lukas
    Fapso, Michal
    Hubeika, Valiantsina
    Glembek, Ondrej
    Karafiat, Martin
    Kockmann, Marcel
    Matejka, Pavel
    Schwartz, Petr
    Cernocky, Jan Honza
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2315 - 2318
  • [3] Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System
    Colibro, Daniele
    Vair, Claudio
    Dalmasso, Emanuele
    Farrell, Kevin
    Karvitsky, Gennady
    Cumani, Sandro
    Laface, Pietro
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1338 - 1342
  • [4] The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System
    Torres-Carrasquillo, Pedro A.
    Richardson, Fred
    Nercessian, Shahan
    Sturim, Douglas
    Campbell, William
    Gwon, Youngjune
    Vattam, Swaroop
    Dehak, Najim
    Mallidi, Harish
    Nidadavolu, Phani Sankar
    Li, Ruizhi
    Dehak, Reda
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1333 - 1337
  • [5] IFLY SYSTEM FOR THE NIST 2008 SPEAKER RECOGNITION EVALUATION
    Guo, Wu
    Long, Yanhua
    Li, Yijie
    Pan, Lei
    Wang, Eryu
    Dai, Lirong
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4209 - 4212
  • [6] THE HKCUPU SYSTEM FOR THE NIST 2010 SPEAKER RECOGNITION EVALUATION
    Jiang, Weiwu
    Mak, Man-Wai
    Rao, Wei
    Meng, Helen
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5288 - 5291
  • [7] THE SRI NIST 2008 SPEAKER RECOGNITION EVALUATION SYSTEM
    Kajarekar, Sachin S.
    Scheffer, Nicolas
    Graciarena, Martin
    Shriberg, Elizabeth
    Stolcke, Andreas
    Ferrer, Luciana
    Bocklet, Tobias
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4205 - 4208
  • [8] THE SRI NIST 2010 SPEAKER RECOGNITION EVALUATION SYSTEM
    Scheffer, Nicolas
    Ferrer, Luciana
    Graciarena, Martin
    Kajarekar, Sachin
    Shriberg, Elizabeth
    Stolcke, Andreas
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5292 - 5295
  • [9] STBU system for the NIST 2006 speaker recognition evaluation
    Matejka, P.
    Burget, L.
    Schwarz, P.
    Glembek, O.
    Karafiat, M.
    Grezl, F.
    Cernocky, J.
    van Leeuwen, D. A.
    Bruemmer, N.
    Strasheim, A.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 221 - +
  • [10] UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation
    Zhang, Chunlei
    Bahmaninezhad, Fahimeh
    Ranjan, Shivesh
    Yu, Chengzhu
    Shokouhi, Navid
    Hansen, John H. L.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1343 - 1347