Limited Labels for Unlimited Data: Active Learning for Speaker Recognition

被引:0
|
作者
Shum, Stephen H. [1 ]
Dehak, Najim [1 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
speaker recognition; i-vectors; active learning; VERIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we attempt to quantify the amount of labeled data necessary to build a state-of-the-art speaker recognition system. We begin by using i-vectors and the cosine similarity metric to represent an unlabeled set of utterances, then obtain labels from a noiseless oracle in the form of pairwise queries. Finally, we use the resulting speaker clusters to train a PLDA scoring function, which is assessed on the 2010 NIST Speaker Recognition Evaluation. After presenting the initial results of an algorithm that sorts queries based on nearest-neighbor pairs, we develop techniques that further minimize the number of queries needed to obtain state-of-the-art performance. We show the generalizability of our methods in anecdotal fashion by applying our methods to two different distributions of utterances-per-speaker and, ultimately, find that the actual number of pairwise labels needed to obtain state-of-the-art results may be a mere fraction of the queries required to fully label the entire set of utterances.
引用
收藏
页码:383 / 387
页数:5
相关论文
共 50 条
  • [41] Neural adversarial learning for speaker recognition
    Chien, Jen-Tzung
    Peng, Kang-Ting
    [J]. COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 422 - 440
  • [42] ADVERSARIAL MANIFOLD LEARNING FOR SPEAKER RECOGNITION
    Chien, Jen-Tzung
    Peng, Kang-Ting
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 599 - 605
  • [43] A deep learning approach for speaker recognition
    Soufiane Hourri
    Jamal Kharroubi
    [J]. International Journal of Speech Technology, 2020, 23 : 123 - 131
  • [44] PLDA Speaker Verification with Limited Speech Data
    Ridzik, Andrej
    Rusko, Milan
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 325 - 332
  • [45] SPEAKER-INDEPENDENT LIPREADING WITH LIMITED DATA
    Yang, Chenzhao
    Wang, Shilin
    Zhang, Xingxuan
    Zhu, Yun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2181 - 2185
  • [46] The RedDots Data Collection for Speaker Recognition
    Lee, Kong Aik
    Larcher, Anthony
    Wang, Guangsen
    Kenny, Patrick
    Brummer, Niko
    van Leeuwen, David
    Aronowitz, Hagai
    Kockmann, Marcel
    Vaqueros, Carlos
    Ma, Bin
    Li, Haizhou
    Stafylakis, Themos
    Alam, Jahangir
    Swart, Albert
    Perez, Javier
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2996 - 3000
  • [47] SAR TARGET CLASSIFICATION WITH LIMITED DATA VIA DATA DRIVEN ACTIVE LEARNING
    Zhou, Yue
    Jiang, Xue
    Li, Zhou
    Liu, Xingzhao
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2475 - 2478
  • [48] SPEAKER-MACHINE INTERACTION IN A LIMITED SPEECH RECOGNITION SYSTEM
    MAKHOUL, JI
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1970, 47 (1P1): : 84 - &
  • [49] Intra-speaker variability compensation in speaker verification with limited enrolling data
    Garreton, Claudio
    Becerra Yoma, Nestor
    Molina, Carlos
    Huenupan, Fernando
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 509 - 512
  • [50] MetaKernel: Learning Variational Random Features With Limited Labels
    Du, Yingjun
    Sun, Haoliang
    Zhen, Xiantong
    Xu, Jun
    Yin, Yilong
    Shao, Ling
    Snoek, Cees G. M.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1464 - 1478