Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models

被引:21
|
作者
Zeinali, Hossein [1 ,2 ,3 ]
Sameti, Hossein [1 ]
Burget, Lukas [2 ,3 ]
Cernocky, Jan Honza [2 ,3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Speech Proc Lab, Tehran, Iran
[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic
[3] IT41 Ctr Excellence, Brno, Czech Republic
来源
关键词
Deep Neural Network; Text-dependent; Speaker verification; i-Vector; Frame alignment; Bottleneck features;
D O I
10.1016/j.csl.2017.04.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inspired by the success of Deep Neural Networks (DNN) in text-independent speaker recognition, we have recently demonstrated that similar ideas can also be applied to the text-dependent speaker verification task. In this paper, we describe new advances with our state-of-the-art i-vector based approach to text-dependent speaker verification, which also makes use of different DNN techniques. In order to collect sufficient statistics for i-vector extraction, different frame alignment models are compared such as GMMs, phonemic HMMs or DNNs trained for senone classification. We also experiment with DNN based bottleneck features and their combinations with standard MFCC features. We experiment with few different DNN configurations and investigate the importance of training DNNs on 16 kHz speech. The results are reported on RSR2015 dataset, where training material is available for all possible enrollment and test phrases. Additionally, we report results also on more challenging RedDots dataset, where the system is built in truly phrase-independent way. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:53 / 71
页数:19
相关论文
共 50 条
  • [1] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
    Variani, Ehsan
    Lei, Xin
    McDermott, Erik
    Moreno, Ignacio Lopez
    Gonzalez-Dominguez, Javier
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Unsupervised Data-driven Hidden Markov Modeling for Text-dependent Speaker Verification
    Petrovska-Delacretaz, Dijana
    Khemiri, Houssemeddine
    [J]. ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 199 - 207
  • [3] ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chowdhury, F. A. Rezaur Rahman
    Wang, Quan
    Moreno, Ignacio Lopez
    Wan, Li
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5359 - 5363
  • [4] Emotional Speaker Verification Based on I-vectors
    Mackova, Lenka
    Cizmar, Anton
    [J]. 2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 533 - 536
  • [5] DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Madikeri, Srikanth
    Ferras, Marc
    Modicek, Petr
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5050 - 5054
  • [6] Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification
    Mingote, Victoria
    Miguel, Antonio
    Ortega, Alfonso
    Lleida, Eduardo
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [7] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    [J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [8] Senone I-Vectors for Robust Speaker Verification
    Tan, Zhili
    Zhu, Yingke
    Mak, Man-Wai
    Mak, Brian Kan-Wing
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [9] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
  • [10] A ROBUST TO OUTLIERS HIDDEN MARKOV MODEL WITH APPLICATION IN TEXT-DEPENDENT SPEAKER IDENTIFICATION
    Chatzis, Sotirios
    Varvarigou, Theodora
    [J]. ICSPC: 2007 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2007, : 804 - 807