Deep speaker embeddings for Speaker Verification: Review and experimental comparison

被引:7
|
作者
Jakubec, Maros [1 ]
Jarina, Roman [1 ]
Lieskovska, Eva [2 ]
Kasak, Peter [1 ]
机构
[1] Univ Zilina, FEIT, Zilina, Slovakia
[2] Univ Sci Pk UNIZA, Zilina, Slovakia
关键词
Automatic speaker verification; Speaker embeddings; Deep learning; i-vector; d-vector; x-vector; r-vector; Benchmark evaluation; VoxCeleb; RECOGNITION;
D O I
10.1016/j.engappai.2023.107232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The construction of speaker-specific acoustic models for automatic speaker recognition is almost exclusively based on deep neural network-based speaker embeddings. This work aims to review the recent progress in speaker embedding development and to perform an experimental benchmark experimental comparison among the state-of-the-art deep speaker representations for a Speaker Verification (SV) task. The performance evaluation of the existing and proposed models on the VoxCeleb1 benchmark database shows that the SV systems based on r-vectors with a Res2Net convolutional architecture including multi-head attention pooling and additive margin softmax outperform other solutions such as d-vectors, x-vectors and conventional r-vectors. In addition, an ensemble network is proposed that fuses the best-performing speaker embeddings. It was found that different types of embeddings can contain complementary speaker-related information. We show that a concatenation of x-vectors and r-vectors can further improve the performance of the SV system. The best-performing embedding ensemble achieves an Equal Error Rate of 2.52% within the Voxceleb1 benchmark test, which is lower than other published results and obtained on the same dataset using the standard Voxceleb1 evaluation methodology.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    [J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [2] Deep Speaker Embeddings for Speaker Verification of Children
    Abed, Mohammed Hamzah
    Sztaho, David
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 58 - 69
  • [3] Deep Speaker Embeddings for Short-Duration Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
  • [4] Deep Discriminative Embeddings for Duration Robust Speaker Verification
    Li, Na
    Tuo, Deyi
    Su, Dan
    Li, Zhifeng
    Yu, Dong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
  • [5] Lightweight Embeddings for Speaker Verification
    Tkachenko, Maxim
    Yamshinin, Alexander
    Kotov, Mikhail
    Nastasenko, Marina
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 687 - 696
  • [6] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
    Seo, Soonshin
    Rim, Daniel Jun
    Lim, Minkyu
    Lee, Donghyun
    Park, Hosung
    Oh, Junseok
    Kim, Changmin
    Kim, Ji-Hwan
    [J]. INTERSPEECH 2019, 2019, : 2928 - 2932
  • [7] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [8] TEXT ADAPTATION FOR SPEAKER VERIFICATION WITH SPEAKER-TEXT FACTORIZED EMBEDDINGS
    Yang, Yexin
    Wang, Shuai
    Gong, Xun
    Qian, Yanmin
    Yu, Kai
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6454 - 6458
  • [9] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [10] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007