Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors

被引:13
|
作者
Maghsoodi, Nooshin [1 ]
Sameti, Hossein [1 ]
Zeinal, Hossein [2 ]
Stafylakis, Themos [3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran 113658639, Iran
[2] Brno Univ Technol, Fac Informat Technol, Brno 61266, Czech Republic
[3] Univ Nottingham, Comp Vis Lab, Nottingham NG7 2RD, England
基金
欧盟地平线“2020”;
关键词
Text dependent speaker verification; uncertainty compensation; text-prompted; HMM; PLDA;
D O I
10.1109/TASLP.2019.2928143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52% and 1.77% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art methods such as x-vectors, trained on vast amounts of data. Furthermore, these results are attained by a single system trained entirely on RSR2015, and by a simple score-normalized cosine distance. Moreover, we show that the omission of channel compensation yields only a minor degradation in performance, meaning that the system attains state-of-the-art results even without recordings from multiple handsets per speaker for training or enrolment. Similar conclusions are drawn from our experiments on the RedDots corpus, where the same method is evaluated on phrases. Finally, we report results with bottleneck features and show that further improvement is attained when fusing them with spectral features.
引用
收藏
页码:1815 / 1825
页数:11
相关论文
共 50 条
  • [41] HMM-Based Speech Recognition Using Adaptive Framing
    Goh, Yeh-Huann
    Raveendran, Paramesran
    [J]. TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2009, : 226 - 230
  • [42] Privacy-preserving speaker verification system based on binary I-vectors
    Mtibaa, Aymen
    Petrovska-Delacretaz, Dijana
    Boudy, Jerome
    Ben Hamida, Ahmed
    [J]. IET BIOMETRICS, 2021, 10 (03) : 233 - 245
  • [43] Foreground and background information in an HMM-based method for recognition of isolated characters and numeral strings
    Britto, AD
    Sabourin, R
    Bortolozzi, F
    Suen, CY
    [J]. NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 371 - 376
  • [44] Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts
    Fernando D'Haro, Luis
    Glembek, Ondrej
    Plchot, Oldrich
    Matejka, Pavel
    Soufifar, Mehdi
    Cordoba, Ricardo
    Cernocky, Jan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 42 - 45
  • [45] Novel Quality Metric for Duration Variability Compensation in Speaker Verification using i-Vectors
    Poddar, Arnab
    Sahidullah, Md
    Saha, Goutam
    [J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2017, : 298 - 303
  • [46] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
  • [47] Probabilistic approach using joint long and short session i-vectors modeling to deal with short utterances for speaker recognition
    Ben Kheder, Waad
    Matrouf, Driss
    Ajili, Moez
    Bonastre, Jean-Francois
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1830 - 1834
  • [48] Exemplar-Based Sparse Representation for Language Recognition on I-Vectors
    Jiang, Bing
    Song, Yan
    Guo, Wu
    Dai, LiRong
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2055 - 2058
  • [49] HANDLING I-VECTORS FROM DIFFERENT RECORDING CONDITIONS USING MULTI-CHANNEL SIMPLIFIED PLDA IN SPEAKER RECOGNITION
    Villalba, Jesus
    Lleida, Eduardo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6763 - 6767
  • [50] Spoken Language Identification Based on I-vectors and Conditional Random Fields
    Heracleous, Panikos
    Mohammad, Yasser
    Takai, Koichi
    Yasuda, Keiji
    Yoneyama, Akio
    [J]. 2018 14TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2018, : 1443 - 1447