Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors

被引:13
|
作者
Maghsoodi, Nooshin [1 ]
Sameti, Hossein [1 ]
Zeinal, Hossein [2 ]
Stafylakis, Themos [3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran 113658639, Iran
[2] Brno Univ Technol, Fac Informat Technol, Brno 61266, Czech Republic
[3] Univ Nottingham, Comp Vis Lab, Nottingham NG7 2RD, England
基金
欧盟地平线“2020”;
关键词
Text dependent speaker verification; uncertainty compensation; text-prompted; HMM; PLDA;
D O I
10.1109/TASLP.2019.2928143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52% and 1.77% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art methods such as x-vectors, trained on vast amounts of data. Furthermore, these results are attained by a single system trained entirely on RSR2015, and by a simple score-normalized cosine distance. Moreover, we show that the omission of channel compensation yields only a minor degradation in performance, meaning that the system attains state-of-the-art results even without recordings from multiple handsets per speaker for training or enrolment. Similar conclusions are drawn from our experiments on the RedDots corpus, where the same method is evaluated on phrases. Finally, we report results with bottleneck features and show that further improvement is attained when fusing them with spectral features.
引用
收藏
页码:1815 / 1825
页数:11
相关论文
共 50 条
  • [1] Accounting For Uncertainty of i-vectors in Speaker Recognition Using Uncertainty Propagation and Modified Imputation
    Saeidi, Rahim
    Alku, Paavo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3546 - 3550
  • [2] Discriminative Scoring for Speaker Recognition Based on I-vectors
    Wang, Jun
    Wang, Dong
    Zhu, Ziwei
    Zheng, Thomas Fang
    Soong, Frank
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [3] JFA for Speaker Recognition with Random Digit Strings
    Stafylakis, Themos
    Kenny, Patrick
    Alam, Jahangir
    Kockmann, Marcel
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 190 - 194
  • [4] Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources
    McLaren, Mitchell
    van Leeuwen, David
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 755 - 766
  • [5] ROBUST SPEAKER RECOGNITION BASED ON DNN/I-VECTORS AND SPEECH SEPARATION
    Chang, Jorge
    Wang, DeLiang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5415 - 5419
  • [6] Speaker age estimation using i-vectors
    Bahari, Mohamad Hasan
    McLaren, Mitchell
    Hugo Van Hamme
    van Leeuwen, David A.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 34 : 99 - 108
  • [7] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
  • [8] Emotional Speaker Verification Based on I-vectors
    Mackova, Lenka
    Cizmar, Anton
    [J]. 2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 533 - 536
  • [9] Speaker recognition in duration-mismatched condition using bootstrapped i-vectors
    Ando, Atsushi
    Asami, Taichi
    Yamaguchi, Yoshikazu
    Aono, Yushi
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [10] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50