Deep speaker embeddings for Speaker Verification: Review and experimental comparison

被引：7

作者：

Jakubec, Maros ^{[1
]}

Jarina, Roman ^{[1
]}

Lieskovska, Eva ^{[2
]}

Kasak, Peter ^{[1
]}

机构：

[1] Univ Zilina, FEIT, Zilina, Slovakia

[2] Univ Sci Pk UNIZA, Zilina, Slovakia

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 127卷

关键词：

Automatic speaker verification; Speaker embeddings; Deep learning; i-vector; d-vector; x-vector; r-vector; Benchmark evaluation; VoxCeleb; RECOGNITION;

D O I：

10.1016/j.engappai.2023.107232

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The construction of speaker-specific acoustic models for automatic speaker recognition is almost exclusively based on deep neural network-based speaker embeddings. This work aims to review the recent progress in speaker embedding development and to perform an experimental benchmark experimental comparison among the state-of-the-art deep speaker representations for a Speaker Verification (SV) task. The performance evaluation of the existing and proposed models on the VoxCeleb1 benchmark database shows that the SV systems based on r-vectors with a Res2Net convolutional architecture including multi-head attention pooling and additive margin softmax outperform other solutions such as d-vectors, x-vectors and conventional r-vectors. In addition, an ensemble network is proposed that fuses the best-performing speaker embeddings. It was found that different types of embeddings can contain complementary speaker-related information. We show that a concatenation of x-vectors and r-vectors can further improve the performance of the SV system. The best-performing embedding ensemble achieves an Equal Error Rate of 2.52% within the Voxceleb1 benchmark test, which is lower than other published results and obtained on the same dataset using the standard Voxceleb1 evaluation methodology.

引用

页数：14

共 50 条

[1] On Deep Speaker Embeddings for Speaker Verification
Jakubec, Maros
Jarina, Roman
Lieskovska, Eva
Chmulik, Michal
[J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
[2] Deep Speaker Embeddings for Speaker Verification of Children
Abed, Mohammed Hamzah
Sztaho, David
[J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 58 - 69
[3] Deep Speaker Embeddings for Short-Duration Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Kenny, Patrick
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
[4] Deep Discriminative Embeddings for Duration Robust Speaker Verification
Li, Na
Tuo, Deyi
Su, Dan
Li, Zhifeng
Yu, Dong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
[5] Lightweight Embeddings for Speaker Verification
Tkachenko, Maxim
Yamshinin, Alexander
Kotov, Mikhail
Nastasenko, Marina
[J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 687 - 696
[6] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
Seo, Soonshin
Rim, Daniel Jun
Lim, Minkyu
Lee, Donghyun
Park, Hosung
Oh, Junseok
Kim, Changmin
Kim, Ji-Hwan
[J]. INTERSPEECH 2019, 2019, : 2928 - 2932
[7] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
Snyder, David
Ghahremani, Pegah
Povey, Daniel
Garcia-Romero, Daniel
Carmiel, Yishay
Khudanpur, Sanjeev
[J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
[8] TEXT ADAPTATION FOR SPEAKER VERIFICATION WITH SPEAKER-TEXT FACTORIZED EMBEDDINGS
Yang, Yexin
Wang, Shuai
Gong, Xun
Qian, Yanmin
Yu, Kai
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6454 - 6458
[9] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Bhattacharya, Gautam
Alam, Jahangir
Gupta, Vishwa
Kenny, Patrick
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
[10] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
Novoselov, Sergey
Gusev, Aleksei
Ivanov, Artem
Pekhovsky, Timur
Shulipa, Andrey
Avdeeva, Anastasia
Gorlanov, Artem
Kozlov, Alexandr
[J]. INTERSPEECH 2019, 2019, : 1003 - 1007

← 1 2 3 4 5 →