Deep speaker embeddings for Speaker Verification: Review and experimental comparison

被引:7
|
作者
Jakubec, Maros [1 ]
Jarina, Roman [1 ]
Lieskovska, Eva [2 ]
Kasak, Peter [1 ]
机构
[1] Univ Zilina, FEIT, Zilina, Slovakia
[2] Univ Sci Pk UNIZA, Zilina, Slovakia
关键词
Automatic speaker verification; Speaker embeddings; Deep learning; i-vector; d-vector; x-vector; r-vector; Benchmark evaluation; VoxCeleb; RECOGNITION;
D O I
10.1016/j.engappai.2023.107232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The construction of speaker-specific acoustic models for automatic speaker recognition is almost exclusively based on deep neural network-based speaker embeddings. This work aims to review the recent progress in speaker embedding development and to perform an experimental benchmark experimental comparison among the state-of-the-art deep speaker representations for a Speaker Verification (SV) task. The performance evaluation of the existing and proposed models on the VoxCeleb1 benchmark database shows that the SV systems based on r-vectors with a Res2Net convolutional architecture including multi-head attention pooling and additive margin softmax outperform other solutions such as d-vectors, x-vectors and conventional r-vectors. In addition, an ensemble network is proposed that fuses the best-performing speaker embeddings. It was found that different types of embeddings can contain complementary speaker-related information. We show that a concatenation of x-vectors and r-vectors can further improve the performance of the SV system. The best-performing embedding ensemble achieves an Equal Error Rate of 2.52% within the Voxceleb1 benchmark test, which is lower than other published results and obtained on the same dataset using the standard Voxceleb1 evaluation methodology.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [22] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [23] Adapting Speaker Embeddings for Speaker Diarisation
    Kwon, Youngki
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    [J]. INTERSPEECH 2021, 2021, : 3101 - 3105
  • [24] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [25] PARTIAL AUC OPTIMIZATION BASED DEEP SPEAKER EMBEDDINGS WITH CLASS-CENTER LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Bai, Zhongxin
    Zhang, Xiao-Lei
    Chen, Jingdong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6819 - 6823
  • [26] Deep Speaker Embeddings Based Online Diarization
    Avdeeva, Anastasia
    Novoselov, Sergey
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 24 - 32
  • [27] Unsupervised deep feature embeddings for speaker diarization
    Ahmad, Rehan
    Zubair, Syed
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (04) : 3138 - 3149
  • [28] AUTOMATIC SPEAKER VERIFICATION - REVIEW
    ROSENBERG, AE
    [J]. PROCEEDINGS OF THE IEEE, 1976, 64 (04) : 475 - 487
  • [29] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    [J]. INTERSPEECH 2019, 2019, : 1168 - 1172
  • [30] Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification
    Li, Lantian
    Tang, Zhiyuan
    Shi, Ying
    Wang, Dong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 284 - 288