Multi-Task Learning for Text-dependent Speaker Verification

被引:0
|
作者
Chen, Nanxin [1 ]
Qian, Yanmin
Yu, Kai
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
关键词
deep neural network; multi-task learning; speaker verification; discriminant analysis; probabilistic linear discriminant analysis; deep learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text-dependent speaker verification uses short utterances and verifies both speaker identity and text contents. Due to this nature, traditional state-of-the-art speaker verification approaches, such as i-vector, may not work well. Recently, there has been interest of applying deep learning to speaker verification, however in previous works, standalone deep learning systems have not achieved state-of-the-art performance and they have to be used in system combination or as tandem features to obtain gains. In this paper, a novel multi-task deep learning framework is proposed for text-dependent speaker verification. First, multi-task deep learning is employed to learn both speaker identity and text information. With the learned network, utterance level average of the outputs of the last hidden layer, referred to as j-vector, means joint-vector, is extracted. Discriminant function, with classes defined as multi-task labels on both speaker and text, is then applied to the j-vectors as the decision function for the closed-set recognition, and Probabilistic Linear Discriminant Analysis (PLDA), with classes defined as on the multi-task labels, is applied to the j-vectors for the verification. Experiments on the RSR2015 corpus showed that the j-vector approach leads to good result on the evaluation data. The proposed multi-task deep learning system achieved 0.54% EER, 0.14% EER for the closed-set condition.
引用
收藏
页码:185 / 189
页数:5
相关论文
共 50 条
  • [1] On Residual CNN in Text-Dependent Speaker Verification Task
    Malykh, Egor
    Novoselov, Sergey
    Kudashev, Oleg
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 593 - 601
  • [2] Deep Embedding Learning for Text-Dependent Speaker Verification
    Zhang, Peng
    Hu, Peng
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 3461 - 3465
  • [3] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    [J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [4] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
    Liu, Ming
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
  • [5] Deep feature for text-dependent speaker verification
    Liu, Yuan
    Qian, Yanmin
    Chen, Nanxin
    Fu, Tianfan
    Zhang, Ya
    Yu, Kai
    [J]. SPEECH COMMUNICATION, 2015, 73 : 1 - 13
  • [6] Text-Dependent Speaker Verification System: A Review
    Debnath, Saswati
    Soni, B.
    Baruah, U.
    Sah, D. K.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [7] Robust Methods for Text-Dependent Speaker Verification
    Bhukya, Ramesh K.
    Prasanna, S. R. Mahadeva
    Sarma, Biswajit Dev
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (11) : 5253 - 5288
  • [8] Bidirectional Attention for Text-Dependent Speaker Verification
    Fang, Xin
    Gao, Tian
    Zou, Liang
    Ling, Zhenhua
    [J]. SENSORS, 2020, 20 (23) : 1 - 17
  • [9] Content Normalization for Text-dependent Speaker Verification
    Dey, Subhadeep
    Madikeri, Srikanth
    Motlicek, Petr
    Ferras, Marc
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1482 - 1486
  • [10] Robust Methods for Text-Dependent Speaker Verification
    Ramesh K. Bhukya
    S. R. Mahadeva Prasanna
    Biswajit Dev Sarma
    [J]. Circuits, Systems, and Signal Processing, 2019, 38 : 5253 - 5288