Training speaker recognition systems with limited data

被引:1
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
关键词
speaker recognition; few-shot learning; wav2vec2;
D O I
10.21437/Interspeech.2022-135
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work considers training neural networks for speaker recognition with a much smaller dataset size compared to contemporary work. We artificially restrict the amount of data by proposing three subsets of the popular VoxCeleb2 dataset. These subsets are restricted to 50 k audio files (versus over 1M files available), and vary on the axis of number of speakers and session variability. We train three speaker recognition systems on these subsets; the X-vector, ECAPA-TDNN, and wav2vec2 network architectures. We show that the self-supervised, pre-trained weights of wav2vec2 substantially improve performance when training data is limited. Code and data subsets are available at https://github.com/nikvaessen/w2v2-speaker-few-samples.
引用
收藏
页码:4760 / 4764
页数:5
相关论文
共 50 条
  • [1] SPEAKER RECOGNITION IN NOISY CONDITIONS WITH LIMITED TRAINING DATA
    McLaughlin, Niall
    Ming, Ji
    Crookes, Danny
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1294 - 1298
  • [2] Automatic Speaker Recognition with Limited Data
    Li, Ruirui
    Jiang, Jyun-Yu
    Liu, Jiahao
    Hsieh, Chu-Cheng
    Wang, Wei
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 340 - 348
  • [3] Comparison of Generative and Discriminative Approaches for Speaker Recognition with Limited Data
    Silovsky, Jan
    Cerva, Petr
    Zdansky, Jindrich
    [J]. RADIOENGINEERING, 2009, 18 (03) : 307 - 316
  • [4] Speaker recognition under limited data condition by noise addition
    Krishnamoorthy, P.
    Jayanna, H. S.
    Prasanna, S. R. M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13487 - 13490
  • [5] Limited Labels for Unlimited Data: Active Learning for Speaker Recognition
    Shum, Stephen H.
    Dehak, Najim
    Glass, James R.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 383 - 387
  • [6] Unsupervised NAP Training Data Design for Speaker Recognition
    Sun, Hanwu
    Ma, Bin
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1098 - 1101
  • [7] Fuzzy vector quantization for speaker recognition under limited data conditions
    Jayanna, H. S.
    Prasanna, S. R. Mahadeva
    [J]. 2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 124 - 127
  • [8] Limited data speaker identification
    H. S. Jayanna
    S. R. Mahadeva Prasanna
    [J]. Sadhana, 2010, 35 : 525 - 546
  • [9] Limited data speaker identification
    Jayanna, H. S.
    Prasanna, S. R. Mahadeva
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2010, 35 (05): : 525 - 546
  • [10] An experimental comparison of modelling techniques for speaker recognition under limited data condition
    Jayanna, H. S.
    Prasanna, S. R. Mahadeva
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2009, 34 (05): : 717 - 728