Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors

被引：13

作者：

Maghsoodi, Nooshin ^{[1
]}

Sameti, Hossein ^{[1
]}

Zeinal, Hossein ^{[2
]}

Stafylakis, Themos ^{[3
]}

机构：

[1] Sharif Univ Technol, Dept Comp Engn, Tehran 113658639, Iran

[2] Brno Univ Technol, Fac Informat Technol, Brno 61266, Czech Republic

[3] Univ Nottingham, Comp Vis Lab, Nottingham NG7 2RD, England

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2019年 / 27卷 / 11期

基金：

欧盟地平线“2020”;

关键词：

Text dependent speaker verification; uncertainty compensation; text-prompted; HMM; PLDA;

D O I：

10.1109/TASLP.2019.2928143

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52% and 1.77% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art methods such as x-vectors, trained on vast amounts of data. Furthermore, these results are attained by a single system trained entirely on RSR2015, and by a simple score-normalized cosine distance. Moreover, we show that the omission of channel compensation yields only a minor degradation in performance, meaning that the system attains state-of-the-art results even without recordings from multiple handsets per speaker for training or enrolment. Similar conclusions are drawn from our experiments on the RedDots corpus, where the same method is evaluated on phrases. Finally, we report results with bottleneck features and show that further improvement is attained when fusing them with spectral features.

引用

页码：1815 / 1825

页数：11

共 50 条

[41] HMM-Based Speech Recognition Using Adaptive Framing
Goh, Yeh-Huann
Raveendran, Paramesran
[J]. TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2009, : 226 - 230
[42] Privacy-preserving speaker verification system based on binary I-vectors
Mtibaa, Aymen
Petrovska-Delacretaz, Dijana
Boudy, Jerome
Ben Hamida, Ahmed
[J]. IET BIOMETRICS, 2021, 10 (03) : 233 - 245
[43] Foreground and background information in an HMM-based method for recognition of isolated characters and numeral strings
Britto, AD
Sabourin, R
Bortolozzi, F
Suen, CY
[J]. NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 371 - 376
[44] Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts
Fernando D'Haro, Luis
Glembek, Ondrej
Plchot, Oldrich
Matejka, Pavel
Soufifar, Mehdi
Cordoba, Ricardo
Cernocky, Jan
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 42 - 45
[45] Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
Poddar, Arnab
Sahidullah, Md
Saha, Goutam
[J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 298 - 303
[46] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
Miao, Yajie
Zhang, Hao
Metze, Florian
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
[47] Probabilistic approach using joint long and short session i-vectors modeling to deal with short utterances for speaker recognition
Ben Kheder, Waad
Matrouf, Driss
Ajili, Moez
Bonastre, Jean-Francois
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1830 - 1834
[48] Exemplar-Based Sparse Representation for Language Recognition on I-Vectors
Jiang, Bing
Song, Yan
Guo, Wu
Dai, LiRong
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2055 - 2058
[49] HANDLING I-VECTORS FROM DIFFERENT RECORDING CONDITIONS USING MULTI-CHANNEL SIMPLIFIED PLDA IN SPEAKER RECOGNITION
Villalba, Jesus
Lleida, Eduardo
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6763 - 6767
[50] Spoken Language Identification Based on I-vectors and Conditional Random Fields
Heracleous, Panikos
Mohammad, Yasser
Takai, Koichi
Yasuda, Keiji
Yoneyama, Akio
[J]. 2018 14TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2018, : 1443 - 1447

← 1 2 3 4 5 →