Deep Speaker Embeddings for Speaker Verification of Children

被引:0
|
作者
Abed, Mohammed Hamzah [1 ]
Sztaho, David [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Artificial Intelligence, Magyar Tudosok Korutja 2, H-1117 Budapest, Hungary
来源
关键词
Forensic voice comparison; children speaker verification; X-vector; RESNET-TDNN; ECAPA-TDNN; likelihood-ratio framework; IDENTIFICATION;
D O I
10.1007/978-3-031-70566-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, deep speaker embedding models are the most advanced feature extraction methods for speaker verification. However, their effectiveness in identifying children's voices has not been thoroughly researched. While various methods have been proposed in recent years, most of them concentrate on adult speakers, with fewer researchers focusing on children. This study examines three deep learning-based speaker embedding methods and their ability to differentiate between child speakers in speaker verification. The study evaluated the X-vector, ECAPA-TDNN, and RESNET-TDNN methods for forensic voice comparison using pre-trained models and fine-tuning them on children's speech samples. The likelihood-ratio framework was used for evaluations using the likelihood-ratio score calculation method based on children's voices. The Samromur Children dataset was used to evaluate the work-flow. It comprises 131 h of speech from 3175 speakers aged between 4 and 17 of both sexes. The results indicate that RESNET-TDNN has the lowest EER and Cllr(min) values (10.8% and 0.368, respectively) without fine-tuning the embedding models. With fine-tuning, ECAPA-TDNN performs the best (EER and Cllrmin are 2.9% and 0.111, respectively). No difference was found between the sexes of the speakers. When the results were analysed based on the age range of the speakers (4-10, 11-15, and 16-17), varying levels of performance were observed. The younger speakers were less accurately identified using the original pre-trained models. However, after fine-tuning, this tendency changed slightly. The results indicate that the models could be used in real-life investigation cases and fine-tuning helps mitigating the performance degradation in young speakers.
引用
下载
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [1] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [2] Deep Speaker Embeddings for Short-Duration Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
  • [3] Deep speaker embeddings for Speaker Verification: Review and experimental comparison
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Kasak, Peter
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [4] Deep Discriminative Embeddings for Duration Robust Speaker Verification
    Li, Na
    Tuo, Deyi
    Su, Dan
    Li, Zhifeng
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2262 - 2266
  • [5] Lightweight Embeddings for Speaker Verification
    Tkachenko, Maxim
    Yamshinin, Alexander
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 687 - 696
  • [6] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
    Seo, Soonshin
    Rim, Daniel Jun
    Lim, Minkyu
    Lee, Donghyun
    Park, Hosung
    Oh, Junseok
    Kim, Changmin
    Kim, Ji-Hwan
    INTERSPEECH 2019, 2019, : 2928 - 2932
  • [7] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [8] TEXT ADAPTATION FOR SPEAKER VERIFICATION WITH SPEAKER-TEXT FACTORIZED EMBEDDINGS
    Yang, Yexin
    Wang, Shuai
    Gong, Xun
    Qian, Yanmin
    Yu, Kai
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6454 - 6458
  • [9] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [10] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    INTERSPEECH 2019, 2019, : 1003 - 1007