Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset

被引：1

作者：

Wu, Yi-Chieh ^{[1
]}

Liao, Wen-Hung ^{[1
]}

机构：

[1] Natl Chengchi Univ, Dept Comp Sci, Taipei, Taiwan

来源：

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2021年

关键词：

Speaker recognition; Acoustic features; Text-independent speaker identification; Cross-lingual dataset; VERIFICATION;

D O I：

10.1109/ICPR48806.2021.9412170

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Over 40% of the world's population is bilingual. Existing speaker identification/verification systems, however, assume the same language type for both enrollment and recognition stages. In this work, we investigate the feasibility of employing multilingual speech for biometric applications. We establish a dataset containing audio recorded in English, Mandarin and Taiwanese. Three acoustic features, namely, i-vector, d-vector and x-vector have been evaluated for both speaker verification (SV) and identification (SI) tasks. Preliminary experimental results indicate that x-vector achieves the best overall performance. Additionally, the model trained with hybrid data demonstrates the highest accuracy, at the cost of extra data collection efforts. In SI tasks, we obtained over 91 % cross-lingual accuracy in all models using 3-second audio. In SV tasks, the EER among cross-lingual test is at most 6.52 %, which is observed on the model trained by English corpus. The outcome suggests the feasibility of adopting cross-lingual speech in building text-independent speaker recognition systems.

引用

页码：8515 / 8522

页数：8

共 37 条

[31] DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin
Li, Tao
Hu, Chenxu
Cong, Jian
Zhu, Xinfa
Li, Jingbei
Tian, Qiao
Wang, Yuping
Xie, Lei
[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3418 - 3430
[32] Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
El-Moneim S.A.
El-Mordy E.A.
Nassar M.A.
Dessouky M.I.
Ismail N.A.
El-Fishawy A.S.
El-Dolil S.
El-Dokany I.M.
El-Samie F.E.A.
[J]. International Journal of Speech Technology, 2022, 25 (03) : 679 - 687
[33] Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes
Chauhan N.
Isshiki T.
Li D.
[J]. SN Computer Science, 4 (5)
[34] Comparison of Text-Independent Speaker Recognition Methods Using VQ-Distortion and Discrete/Continuous HMM's
Matsui, Tomoko
Furui, Sadaoki
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 456 - 459
[35] COMPARISON OF TEXT-INDEPENDENT SPEAKER RECOGNITION METHODS USING VECTOR-QUANTIZATION DISTORTION AND DISCRETE AND CONTINUOUS HMMS
MATSUI, T
FURUI, S
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1994, 77 (12): : 63 - 70
[36] A Comparative Study of Text-Independent Speaker Recognition Systems Using Gaussian Mixture Modeling and i-vector Methods
Paulose, Suma
Mathew, Dominic
Thomas, Abraham
[J]. 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 444 - 448
[37] Cross-lingual Text Reuse Detection Using Translation Plus Monolingual Analysis for English-Urdu Language Pair
Muneer, Iqra
Nawab, Rao Muhammad Adeel
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)

← 1 2 3 4 →