Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

被引:1
|
作者
Wu, Yi-Chieh [1 ]
Liao, Wen-Hung [1 ]
机构
[1] Natl Chengchi Univ, Dept Comp Sci, Taipei, Taiwan
关键词
Speaker recognition; Acoustic features; Text-independent speaker identification; Cross-lingual dataset; VERIFICATION;
D O I
10.1109/ICPR48806.2021.9412170
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.
引用
收藏
页码:8515 / 8522
页数:8
相关论文
共 37 条
  • [31] DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin
    Li, Tao
    Hu, Chenxu
    Cong, Jian
    Zhu, Xinfa
    Li, Jingbei
    Tian, Qiao
    Wang, Yuping
    Xie, Lei
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3418 - 3430
  • [32] Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
    El-Moneim S.A.
    El-Mordy E.A.
    Nassar M.A.
    Dessouky M.I.
    Ismail N.A.
    El-Fishawy A.S.
    El-Dolil S.
    El-Dokany I.M.
    El-Samie F.E.A.
    [J]. International Journal of Speech Technology, 2022, 25 (03) : 679 - 687
  • [33] Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes
    Chauhan N.
    Isshiki T.
    Li D.
    [J]. SN Computer Science, 4 (5)
  • [34] Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMM's
    Matsui, Tomoko
    Furui, Sadaoki
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 456 - 459
  • [35] COMPARISON OF TEXT-INDEPENDENT SPEAKER RECOGNITION METHODS USING VECTOR-QUANTIZATION DISTORTION AND DISCRETE AND CONTINUOUS HMMS
    MATSUI, T
    FURUI, S
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (12): : 63 - 70
  • [36] A Comparative Study of Text-Independent Speaker Recognition Systems Using Gaussian Mixture Modeling and i-vector Methods
    Paulose, Suma
    Mathew, Dominic
    Thomas, Abraham
    [J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 444 - 448
  • [37] Cross-lingual Text Reuse Detection Using Translation Plus Monolingual Analysis for English-Urdu Language Pair
    Muneer, Iqra
    Nawab, Rao Muhammad Adeel
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)