X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION

被引:0
|
作者
Snyder, David [1 ]
Garcia-Romero, Daniel
Sell, Gregory
Povey, Daniel
Khudanpur, Sanjeev
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
speaker recognition; deep neural networks; data augmentation; x-vectors; NOISE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we use data augmentation to improve performance of deep neural network (DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate between speakers, maps variable-length utterances to fixed-dimensional embeddings that we call x-vectors. Prior studies have found that embeddings leverage large-scale training datasets better than i-vectors. However, it can be challenging to collect substantial quantities of labeled data for training. We use data augmentation, consisting of added noise and reverberation, as an inexpensive method to multiply the amount of training data and improve robustness. The x-vectors are compared with i-vector baselines on Speakers in the Wild and NIST SRE 2016 Cantonese. We find that while augmentation is beneficial in the PLDA classifier, it is not helpful in the i-vector extractor. However, the x-vector DNN effectively exploits data augmentation, due to its supervised training. As a result, the x-vectors achieve superior performance on the evaluation datasets.
引用
收藏
页码:5329 / 5333
页数:5
相关论文
共 50 条
  • [1] SPEAKER RECOGNITION FOR MULTI-SPEAKER CONVERSATIONS USING X-VECTORS
    Snyder, David
    Garcia-Romero, Daniel
    Sell, Gregory
    McCree, Alan
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5796 - 5800
  • [2] Speaker Recognition from Distance Using X-Vectors with Reverberation-Robust Features
    Witkowski, Marcin
    Rybicka, Magdalena
    Kowalczyk, Konrad
    [J]. 2019 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2019), 2019, : 235 - 240
  • [3] Investigation of DNN based Feature Enhancement Jointly Trained with X-Vectors for Noise-Robust Speaker Verification
    Yang, Joon-Young
    Park, Kwan-Ho
    Chang, Joon-Hyuk
    Kim, Youngsam
    Cho, Sangrae
    [J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [4] X-VECTORS MEET EMOTIONS: A STUDY ON DEPENDENCIES BETWEEN EMOTION AND SPEAKER RECOGNITION
    Pappagari, Raghavendra
    Wang, Tianzi
    Villalba, Jesus
    Chen, Nanxin
    Dehak, Najim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7169 - 7173
  • [5] Joint optimization of neural acoustic beamforming and dereverberation with x-vectors for robust speaker verification
    Yang, Joon-Young
    Chang, Joon-Hyuk
    [J]. INTERSPEECH 2019, 2019, : 4075 - 4079
  • [6] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [7] A transfer learning SHM strategy for bridges enriched by the use of speaker recognition x-vectors
    Eleonora M. Tronci
    Homayoon Beigi
    Maria Q. Feng
    Raimondo Betti
    [J]. Journal of Civil Structural Health Monitoring, 2022, 12 : 1285 - 1298
  • [8] A transfer learning SHM strategy for bridges enriched by the use of speaker recognition x-vectors
    Tronci, Eleonora M.
    Beigi, Homayoon
    Feng, Maria Q.
    Betti, Raimondo
    [J]. JOURNAL OF CIVIL STRUCTURAL HEALTH MONITORING, 2022, 12 (06) : 1285 - 1298
  • [9] Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors
    Kelly, Finnian
    Forth, Oscar
    Kent, Samuel
    Gerlach, Linda
    Alexander, Anil
    [J]. 2019 AES INTERNATIONAL CONFERENCE ON AUDIO FORENSICS, 2019,
  • [10] Weighted X-Vectors for Robust Text-Independent Speaker Verification with Multiple Enrollment Utterances
    Mohammadi, Mohsen
    Mohammadi, Hamid Reza Sadegh
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (05) : 2825 - 2844