Multi-Task Learning for Text-dependent Speaker Verification

被引：0

作者：

Chen, Nanxin ^{[1
]}

Qian, Yanmin

Yu, Kai

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

deep neural network; multi-task learning; speaker verification; discriminant analysis; probabilistic linear discriminant analysis; deep learning;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Text-dependent speaker verification uses short utterances and verifies both speaker identity and text contents. Due to this nature, traditional state-of-the-art speaker verification approaches, such as i-vector, may not work well. Recently, there has been interest of applying deep learning to speaker verification, however in previous works, standalone deep learning systems have not achieved state-of-the-art performance and they have to be used in system combination or as tandem features to obtain gains. In this paper, a novel multi-task deep learning framework is proposed for text-dependent speaker verification. First, multi-task deep learning is employed to learn both speaker identity and text information. With the learned network, utterance level average of the outputs of the last hidden layer, referred to as j-vector, means joint-vector, is extracted. Discriminant function, with classes defined as multi-task labels on both speaker and text, is then applied to the j-vectors as the decision function for the closed-set recognition, and Probabilistic Linear Discriminant Analysis (PLDA), with classes defined as on the multi-task labels, is applied to the j-vectors for the verification. Experiments on the RSR2015 corpus showed that the j-vector approach leads to good result on the evaluation data. The proposed multi-task deep learning system achieved 0.54% EER, 0.14% EER for the closed-set condition.

引用

页码：185 / 189

页数：5

共 50 条

[1] On Residual CNN in Text-Dependent Speaker Verification Task
Malykh, Egor
Novoselov, Sergey
Kudashev, Oleg
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 593 - 601
[2] Deep Embedding Learning for Text-Dependent Speaker Verification
Zhang, Peng
Hu, Peng
Zhang, Xueliang
[J]. INTERSPEECH 2020, 2020, : 3461 - 3465
[3] Text-dependent speaker verification system
Qin, Bing
Chen, Huipeng
Li, Guangqi
Liu, Songbo
[J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
[4] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
Liu, Ming
Huang, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
[5] Deep feature for text-dependent speaker verification
Liu, Yuan
Qian, Yanmin
Chen, Nanxin
Fu, Tianfan
Zhang, Ya
Yu, Kai
[J]. SPEECH COMMUNICATION, 2015, 73 : 1 - 13
[6] Text-Dependent Speaker Verification System: A Review
Debnath, Saswati
Soni, B.
Baruah, U.
Sah, D. K.
[J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
[7] Robust Methods for Text-Dependent Speaker Verification
Bhukya, Ramesh K.
Prasanna, S. R. Mahadeva
Sarma, Biswajit Dev
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (11) : 5253 - 5288
[8] Bidirectional Attention for Text-Dependent Speaker Verification
Fang, Xin
Gao, Tian
Zou, Liang
Ling, Zhenhua
[J]. SENSORS, 2020, 20 (23) : 1 - 17
[9] Content Normalization for Text-dependent Speaker Verification
Dey, Subhadeep
Madikeri, Srikanth
Motlicek, Petr
Ferras, Marc
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1482 - 1486
[10] Robust Methods for Text-Dependent Speaker Verification
Ramesh K. Bhukya
S. R. Mahadeva Prasanna
Biswajit Dev Sarma
[J]. Circuits, Systems, and Signal Processing, 2019, 38 : 5253 - 5288

← 1 2 3 4 5 →