DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION

被引:0
|
作者
Snyder, David [1 ]
Ghahremani, Pegah
Povey, Daniel
Garcia-Romero, Daniel
Carmiel, Yishay
Khudanpur, Sanjeev
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
speaker verification; deep neural networks; end-to-end training; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we investigate an end-to-end text-independent speaker verification system. The architecture consists of a deep neural network that takes a variable length speech segment and maps it to a speaker embedding. The objective function separates same-speaker and different-speaker pairs, and is reused during verification. Similar systems have recently shown promise for text-dependent verification, but we believe that this is unexplored for the text-independent task. We show that given a large number of training speakers, the proposed system outperforms an i-vector baseline in equal error-rate (EER) and at low miss rates. Relative to the baseline, the end-to-end system reduces EER by 13% average and 29% pooled across test conditions. The fused system achieves a reduction of 32% average and 38% pooled.
引用
收藏
页码:165 / 170
页数:6
相关论文
共 50 条
  • [1] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
    Seo, Soonshin
    Rim, Daniel Jun
    Lim, Minkyu
    Lee, Donghyun
    Park, Hosung
    Oh, Junseok
    Kim, Changmin
    Kim, Ji-Hwan
    [J]. INTERSPEECH 2019, 2019, : 2928 - 2932
  • [2] End-To-End Phonetic Neural Network Approach for Speaker Verification
    Demirbag, Sedat
    Erden, Mustafa
    Arslan, Levent
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [3] Neural PLDA Modeling for End-to-End Speaker Verification
    Ramoji, Shreyas
    Krishnan, Prashant
    Ganapathy, Sriram
    [J]. INTERSPEECH 2020, 2020, : 4333 - 4337
  • [4] A High-Performance Neural Network SoC for End-to-End Speaker Verification
    Tsai, Tsung-Han
    Chiang, Meng-Jui
    [J]. IEEE Access, 2024, 12 : 165482 - 165496
  • [5] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [6] GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION
    Wan, Li
    Wang, Quan
    Papir, Alan
    Moreno, Ignacio Lopez
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4879 - 4883
  • [7] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    [J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [8] RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, Ju-ho
    Shim, Hye-jin
    Yu, Ha-Jin
    [J]. INTERSPEECH 2019, 2019, : 1268 - 1272
  • [9] TOWARDS END-TO-END SPEAKER DIARIZATION WITH GENERALIZED NEURAL SPEAKER CLUSTERING
    Zhang, Chunlei
    Shi, Jiatong
    Weng, Chao
    Yu, Meng
    Yu, Dong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8372 - 8376
  • [10] End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification
    Heo, Hee-Soo
    Jung, Jee-weon
    Yang, IL-Ho
    Yoon, Sung-Hyun
    Shim, Hye-jin
    Yu, Ha-Jin
    [J]. INTERSPEECH 2019, 2019, : 4035 - 4039