Deep Speaker Embeddings for Speaker Verification of Children

被引:0
|
作者
Abed, Mohammed Hamzah [1 ]
Sztaho, David [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Artificial Intelligence, Magyar Tudosok Korutja 2, H-1117 Budapest, Hungary
来源
关键词
Forensic voice comparison; children speaker verification; X-vector; RESNET-TDNN; ECAPA-TDNN; likelihood-ratio framework; IDENTIFICATION;
D O I
10.1007/978-3-031-70566-3_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, deep speaker embedding models are the most advanced feature extraction methods for speaker verification. However, their effectiveness in identifying children's voices has not been thoroughly researched. While various methods have been proposed in recent years, most of them concentrate on adult speakers, with fewer researchers focusing on children. This study examines three deep learning-based speaker embedding methods and their ability to differentiate between child speakers in speaker verification. The study evaluated the X-vector, ECAPA-TDNN, and RESNET-TDNN methods for forensic voice comparison using pre-trained models and fine-tuning them on children's speech samples. The likelihood-ratio framework was used for evaluations using the likelihood-ratio score calculation method based on children's voices. The Samromur Children dataset was used to evaluate the work-flow. It comprises 131 h of speech from 3175 speakers aged between 4 and 17 of both sexes. The results indicate that RESNET-TDNN has the lowest EER and Cllr(min) values (10.8% and 0.368, respectively) without fine-tuning the embedding models. With fine-tuning, ECAPA-TDNN performs the best (EER and Cllrmin are 2.9% and 0.111, respectively). No difference was found between the sexes of the speakers. When the results were analysed based on the age range of the speakers (4-10, 11-15, and 16-17), varying levels of performance were observed. The younger speakers were less accurately identified using the original pre-trained models. However, after fine-tuning, this tendency changed slightly. The results indicate that the models could be used in real-life investigation cases and fine-tuning helps mitigating the performance degradation in young speakers.
引用
下载
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [31] Attentive Deep CNN for Speaker Verification
    Yu, Yong-bin
    Qi, Min-hui
    Tang, Yi-fan
    Deng, Quan-xin
    Peng, Chen-hui
    Mai, Feng
    Nyima, Tashi
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [32] Deep Speaker Embedding with Frame-Constrained Training Strategy for Speaker Verification
    Gu, Bin
    INTERSPEECH 2022, 2022, : 1451 - 1455
  • [33] SPEAKER DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS
    Garcia-Romero, Daniel
    Snyder, David
    Sell, Gregory
    Povey, Daniel
    McCree, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4930 - 4934
  • [34] Combining Deep Speaker Specific Representations with GMM-SVM for Speaker Verification
    Price, Ryan
    Biswas, Sangeeta
    Shinoda, Koichi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2787 - 2791
  • [35] Effective speaker adaptations for speaker verification
    Ahn, S
    Kang, S
    Ko, H
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1081 - 1084
  • [36] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [37] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    INTERSPEECH 2019, 2019, : 386 - 390
  • [38] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
    Sun, G.
    Liu, D.
    Zhang, C.
    Woodland, P. C.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
  • [39] Speaker Verification based on extraction of Deep Features
    Mitsianis, Evangelos
    Spyrou, Evaggelos
    Giannakopoulos, Theodore
    10TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2018), 2018,
  • [40] SPEAKER VERIFICATION
    CHAPMAN, WD
    LI, KP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1966, 40 (05): : 1282 - &