Tuning the performance of automatic speaker recognition in different conditions: effects of language and simulated voice disguise

被引:1
|
作者
Skarnitzl, Radek [1 ]
Asiaee, Maral [2 ]
Nourbakhsh, Mandana [3 ]
机构
[1] Charles Univ Prague, Prague, Czech Republic
[2] Alzahra Univ, Gen Linguist, Tehran, Iran
[3] Alzahra Univ, Linguist Dept, Phonet Phonol & Psycholinguist, Tehran, Iran
关键词
AUTOMATIC SPEAKER RECOGNITION; FORENSIC PHONETICS; VOICE DISGUISE; VERIFICATION;
D O I
10.1558/ijsll.39778
中图分类号
DF [法律]; D9 [法律];
学科分类号
0301 ;
摘要
Automatic speaker recognition applications have often been described as a 'black box'. This study explores the benefit of tuning procedures (condition adaptation and reference normalisation) implemented in an i-vector PLDA framework ASR system, VOCALISE. These procedures enable users to open the black box to a certain degree. Subsets of two 100-speaker databases, one of Czech and the other of Persian male speakers, are used for the baseline condition and for the tuning procedures. The effect of tuning with cross-language material, as well as the effect of simulated voice disguise, achieved by raising the fundamental frequency by four semitones and resonance characteristics by 8%, are also examined. The results show superior recognition performance (EER) for Persian than Czech in the baseline condition, but an opposite result in the simulated disguise condition; possible reasons for this are discussed. Overall, the study suggests that both condition adaptation and reference normalisation are beneficial to recognition performance.
引用
收藏
页码:209 / 229
页数:21
相关论文
共 50 条
  • [1] Voice Disguise in Automatic Speaker Recognition
    Farrus, Mireia
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [2] Voice disguise and automatic speaker recognition
    Zhang, Cuiling
    Tan, Tiejun
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2008, 175 (2-3) : 118 - 122
  • [3] Influence of Natural Voice Disguise Techniques on Automatic Speaker Recognition
    Staroniewicz, Piotr
    [J]. 2018 JOINT CONFERENCE - ACOUSTICS, 2018, : 299 - 302
  • [4] Comparison of subjective and objective speaker recognition under voice disguise conditions
    Majewski, Wojciech
    [J]. ARCHIVES OF ACOUSTICS, 2007, 32 (04) : 173 - 178
  • [5] When Automatic Voice Disguise Meets Automatic Speaker Verification
    Zheng, Linlin
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    Zheng, Thomas Fang
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 824 - 837
  • [6] When automatic voice disguise meets automatic speaker verification
    Zheng, Linlin
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    Zheng, Thomas Fang
    [J]. IEEE Transactions on Information Forensics and Security, 2021, 16 : 824 - 837
  • [7] Subjective tests of speaker recognition for selected voice disguise techniques
    Staroniewicz, Piotr
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (03) : 615 - 620
  • [8] Minimum of Information Divergence Criterion for Signals with Tuning to Speaker Voice in Automatic Speech Recognition
    Savchenko V.V.
    [J]. Radioelectronics and Communications Systems, 2020, 63 (01) : 42 - 54
  • [9] Automatic speaker recognition as a measurement of voice imitation and conversion
    Farrus, Mireia
    Wagner, Michael
    Erro, Daniel
    Hernando, Javier
    [J]. INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2010, 17 (01) : 119 - 142
  • [10] Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition
    Park, Soo Jin
    Sigouin, Caroline
    Kreiman, Jody
    Keating, Patricia
    Guo, Jinxi
    Yeung, Gary
    Kuo, Fang-Yu
    Alwan, Abeer
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1044 - 1048