Deep feature for text-dependent speaker verification

被引：140

作者：

Liu, Yuan ^{[1
]}

Qian, Yanmin ^{[1
]}

Chen, Nanxin ^{[1
]}

Fu, Tianfan ^{[1
]}

Zhang, Ya ^{[2
]}

Yu, Kai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200030, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai 200030, Peoples R China

来源：

SPEECH COMMUNICATION | 2015年 / 73卷

关键词：

Text-dependent speaker verification; Deep neural networks; Deep features; RSR2015; HIDDEN MARKOV-MODELS; NEURAL-NETWORKS; MACHINES;

D O I：

10.1016/j.specom.2015.07.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently deep learning has been successfully used in speech recognition, however it has not been carefully explored and widely accepted for speaker verification. To incorporate deep learning into speaker verification, this paper proposes novel approaches of extracting and using features from deep learning models for text-dependent speaker verification. In contrast to the traditional short-term spectral feature, such as MFCC or PLP, in this paper, outputs from hidden layer of various deep models are employed as deep features for text-dependent speaker verification. Fours types of deep models are investigated: deep Restricted Boltzmann Machines, speech-discriminant Deep Neural Network (DNN), speaker-discriminant DNN, and multi-task joint-learned DNN. Once deep features are extracted, they may be used within either the GMM-UBM framework or the identity vector (i-vector) framework. Joint linear discriminant analysis and probabilistic linear discriminant analysis are proposed as effective back-end classifiers for identity vector based deep features. These approaches were evaluated on the RSR2015 data corpus. Experiments showed that deep feature based methods can obtain significant performance improvements compared to the traditional baselines, no matter if they are directly applied in the GMM-UBM system or utilized as identity vectors. The EER of the best system using the proposed identity vector is 0.10%, only one fifteenth of that in the GMM-UBM baseline. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：1 / 13

页数：13

共 50 条

[1] Covariance Based Deep Feature for Text-Dependent Speaker Verification
Wang, Shuai
Dinkel, Heinrich
Qian, Yanmin
Yu, Kai
[J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 231 - 242
[2] Deep Embedding Learning for Text-Dependent Speaker Verification
Zhang, Peng
Hu, Peng
Zhang, Xueliang
[J]. INTERSPEECH 2020, 2020, : 3461 - 3465
[3] Tandem Deep Features for Text-Dependent Speaker Verification
Fu, Tianfan
Qian, Yanmin
Liu, Yuan
Yu, Kai
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1327 - 1331
[4] Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition
Li, Lantian
Lin, Yiye
Zhang, Zhiyong
Wang, Dong
[J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 426 - 429
[5] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Chen, Liping
Zhao, Yong
Zhang, Shi-Xiong
Li, Jie
Ye, Guoli
Soong, Frank
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
[6] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
Variani, Ehsan
Lei, Xin
McDermott, Erik
Moreno, Ignacio Lopez
Gonzalez-Dominguez, Javier
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Text-Dependent Speaker Verification System: A Review
Debnath, Saswati
Soni, B.
Baruah, U.
Sah, D. K.
[J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
[8] DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Madikeri, Srikanth
Ferras, Marc
Modicek, Petr
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5050 - 5054
[9] Bidirectional Attention for Text-Dependent Speaker Verification
Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
[J]. SENSORS, 2020, 20 (23) : 1 - 17
[10] Robust Methods for Text-Dependent Speaker Verification
Bhukya, Ramesh K.
Prasanna, S. R. Mahadeva
Sarma, Biswajit Dev
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (11) : 5253 - 5288

← 1 2 3 4 5 →