Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models

被引：21

作者：

Zeinali, Hossein ^{[1
,2
,3
]}

Sameti, Hossein ^{[1
]}

Burget, Lukas ^{[2
,3
]}

Cernocky, Jan Honza ^{[2
,3
]}

机构：

[1] Sharif Univ Technol, Dept Comp Engn, Speech Proc Lab, Tehran, Iran

[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic

[3] IT41 Ctr Excellence, Brno, Czech Republic

来源：

COMPUTER SPEECH AND LANGUAGE | 2017年 / 46卷

关键词：

Deep Neural Network; Text-dependent; Speaker verification; i-Vector; Frame alignment; Bottleneck features;

D O I：

10.1016/j.csl.2017.04.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inspired by the success of Deep Neural Networks (DNN) in text-independent speaker recognition, we have recently demonstrated that similar ideas can also be applied to the text-dependent speaker verification task. In this paper, we describe new advances with our state-of-the-art i-vector based approach to text-dependent speaker verification, which also makes use of different DNN techniques. In order to collect sufficient statistics for i-vector extraction, different frame alignment models are compared such as GMMs, phonemic HMMs or DNNs trained for senone classification. We also experiment with DNN based bottleneck features and their combinations with standard MFCC features. We experiment with few different DNN configurations and investigate the importance of training DNNs on 16 kHz speech. The results are reported on RSR2015 dataset, where training material is available for all possible enrollment and test phrases. Additionally, we report results also on more challenging RedDots dataset, where the system is built in truly phrase-independent way. (C) 2017 Elsevier Ltd. All rights reserved.

引用

页码：53 / 71

页数：19

共 50 条

[1] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
Variani, Ehsan
Lei, Xin
McDermott, Erik
Moreno, Ignacio Lopez
Gonzalez-Dominguez, Javier
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] Unsupervised Data-driven Hidden Markov Modeling for Text-dependent Speaker Verification
Petrovska-Delacretaz, Dijana
Khemiri, Houssemeddine
[J]. ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 199 - 207
[3] ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Chowdhury, F. A. Rezaur Rahman
Wang, Quan
Moreno, Ignacio Lopez
Wan, Li
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5359 - 5363
[4] Emotional Speaker Verification Based on I-vectors
Mackova, Lenka
Cizmar, Anton
[J]. 2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 533 - 536
[5] DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Madikeri, Srikanth
Ferras, Marc
Modicek, Petr
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5050 - 5054
[6] Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification
Mingote, Victoria
Miguel, Antonio
Ortega, Alfonso
Lleida, Eduardo
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (16):
[7] Text-dependent speaker verification system
Qin, Bing
Chen, Huipeng
Li, Guangqi
Liu, Songbo
[J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
[8] Senone I-Vectors for Robust Speaker Verification
Tan, Zhili
Zhu, Yingke
Mak, Man-Wai
Mak, Brian Kan-Wing
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[9] Robust Speaker Verification Using GFCC Based i-Vectors
Jeevan, Medikonda
Dhingra, Atul
Hanmandlu, M.
Panigrahi, B. K.
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
[10] A ROBUST TO OUTLIERS HIDDEN MARKOV MODEL WITH APPLICATION IN TEXT-DEPENDENT SPEAKER IDENTIFICATION
Chatzis, Sotirios
Varvarigou, Theodora
[J]. ICSPC: 2007 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2007, : 804 - 807

← 1 2 3 4 5 →