The Speakers in the Wild (SITW) Speaker Recognition Database

被引:168
|
作者
McLaren, Mitchell [1 ]
Ferrer, Luciana [2 ,3 ]
Castan, Diego [1 ]
Lawson, Aaron [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
[2] Univ Buenos Aires, FCEN, Dept Comp, Buenos Aires, DF, Argentina
[3] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speaker recognition; database; real-world data;
D O I
10.21437/Interspeech.2016-1129
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology on single and multi-speaker audio acquired across unconstrained or "wild" conditions. The database consists of recordings of 299 speakers, with an average of eight different sessions per person. Unlike existing databases for speaker recognition, this data was not collected under controlled conditions and thus contains real noise, reverberation, intraspeaker variability and compression artifacts. These factors are often convolved in the real world, as the SITW data shows, and they make SITW a challenging database for single- and multi speaker recognition
引用
收藏
页码:818 / 822
页数:5
相关论文
共 50 条
  • [41] Speaker Recognition
    Tripathi, Supriya
    Bhatnagar, Smriti
    2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGY (ICCCT), 2012, : 283 - 287
  • [42] NATIVE SPEAKER REACTIONS TO SPEAKERS OF ESL - COMMENT
    UPSHUR, JA
    TESOL QUARTERLY, 1978, 12 (02) : 215 - 216
  • [43] Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish
    Ramos, Daniel
    Gonzalez-Rodriguez, Joaquin
    Gonzalez-Dominguez, Javier
    Lucena-Molina, Jose Juan
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1493 - +
  • [44] SPEAKER ADAPTATION USING MULTIPLE REFERENCE SPEAKERS
    KUBALA, F
    SCHWARTZ, R
    BARRY, C
    SPEECH AND NATURAL LANGUAGE, 1989, : 256 - 262
  • [45] Speaker recognition and speaker normalization by projection to speaker subspace
    Ariki, Y
    Tagashira, S
    Nishijima, M
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 319 - 322
  • [46] ON THE USE OF SPEAKER SUPERFACTORS FOR SPEAKER RECOGNITION
    Scheffer, Nicolas
    Vogt, Robbie
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4410 - 4413
  • [47] Speaker Dependent Coefficients for Speaker Recognition
    Orsag, Filip
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2010, 4 (01): : 31 - 47
  • [48] TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition
    Li, Dongdong
    Liu, Jinlin
    Wang, Zhe
    Li, Yanqiong
    Chen, Baijun
    Cai, Lizhi
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (07) : 3931 - 3956
  • [49] TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition
    Dongdong Li
    Jinlin Liu
    Zhe Wang
    Yanqiong Li
    Baijun Chen
    Lizhi Cai
    Circuits, Systems, and Signal Processing, 2022, 41 : 3931 - 3956
  • [50] KNOWING THE NON-TARGET SPEAKERS: THE EFFECT OF THE I-VECTOR POPULATION FOR PLDA TRAINING IN SPEAKER RECOGNITION
    van Leeuwen, David A.
    Saeidi, Rahim
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6778 - 6782