Deep Speaker Embeddings for Speaker Verification of Children

被引:0
|
作者
Abed, Mohammed Hamzah [1 ]
Sztaho, David [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Artificial Intelligence, Magyar Tudosok Korutja 2, H-1117 Budapest, Hungary
来源
关键词
Forensic voice comparison; children speaker verification; X-vector; RESNET-TDNN; ECAPA-TDNN; likelihood-ratio framework; IDENTIFICATION;
D O I
10.1007/978-3-031-70566-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, deep speaker embedding models are the most advanced feature extraction methods for speaker verification. However, their effectiveness in identifying children's voices has not been thoroughly researched. While various methods have been proposed in recent years, most of them concentrate on adult speakers, with fewer researchers focusing on children. This study examines three deep learning-based speaker embedding methods and their ability to differentiate between child speakers in speaker verification. The study evaluated the X-vector, ECAPA-TDNN, and RESNET-TDNN methods for forensic voice comparison using pre-trained models and fine-tuning them on children's speech samples. The likelihood-ratio framework was used for evaluations using the likelihood-ratio score calculation method based on children's voices. The Samromur Children dataset was used to evaluate the work-flow. It comprises 131 h of speech from 3175 speakers aged between 4 and 17 of both sexes. The results indicate that RESNET-TDNN has the lowest EER and Cllr(min) values (10.8% and 0.368, respectively) without fine-tuning the embedding models. With fine-tuning, ECAPA-TDNN performs the best (EER and Cllrmin are 2.9% and 0.111, respectively). No difference was found between the sexes of the speakers. When the results were analysed based on the age range of the speakers (4-10, 11-15, and 16-17), varying levels of performance were observed. The younger speakers were less accurately identified using the original pre-trained models. However, after fine-tuning, this tendency changed slightly. The results indicate that the models could be used in real-life investigation cases and fine-tuning helps mitigating the performance degradation in young speakers.
引用
下载
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [41] Exploring Algorithmic Fairness in Deep Speaker Verification
    Fenu, Gianni
    Lafhouli, Hicham
    Marras, Mirko
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2020, PART IV, 2020, 12252 : 77 - 93
  • [42] Speaker verification
    Atkins, Wendy
    Biometric Technology Today, 2001, 9 (03) : 8 - 11
  • [43] MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 192 - 198
  • [44] Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?
    Wang, Qiongqiong
    Lee, Kong Aik
    Liu, Tianchi
    INTERSPEECH 2022, 2022, : 600 - 604
  • [45] Disentangling speaker and channel effects in speaker verification
    Kenny, P
    Dumouchel, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 37 - 40
  • [46] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
    Sarma, Biswajit Dev
    Das, Rohan Kumar
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
  • [47] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [48] Privacy-Preserving Speaker Verification using Secure Binary Embeddings
    Portelo, Jose
    Raj, Bhiksha
    Alberto, Abad
    Trancoso, Isabel
    2014 37TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2014, : 1268 - 1272
  • [49] Privacy-preserving speaker verification using secure binary embeddings
    20143718152428
    (1) INESC-ID, Lisboa, Portugal; (2) Instituto Superior Técnico, Lisboa, Portugal; (3) Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, United States, 1600, Ericsson Nikola Tesla Zagreb; et al.; HEP - Croatian Electricity Company Zagreb; InfoDom Zagreb; Koncar-Electrical Industries Zagreb; T-Croatian Telecom Zagreb (IEEE Computer Society):
  • [50] DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
    Yi, Lu
    Mak, Man-Wai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7662 - 7666