EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Chen, Liping [1 ]
Zhao, Yong [2 ]
Zhang, Shi-Xiong [2 ]
Li, Jie [1 ]
Ye, Guoli [2 ]
Soong, Frank [3 ]
机构
[1] Microsoft Search Technol Ctr Asia, Beijing, Peoples R China
[2] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
Text-dependent speaker verification; sequential speaker characteristics; speaker supervector; dynamic time warping; VARIABILITY;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, given the speaker bottleneck feature vectors extracted with speaker discriminant neural networks, we focus on using the sequential speaker characteristics for text-dependent speaker verification. In each evaluation trial, speaker supervectors are used as the representations of the sequential speaker characteristics rendered in the compared speech utterances. To this end, dynamic time warping is used to warp the variable-length speaker feature vector sequences of the utterances to the same length. Thereafter for every utterance, a speaker supervector can be obtained as the concatenation of its speaker feature vectors. We use Euclidean distance and support vector machine (SVM) to compute the decision score on the speaker supervectors. Our experiments on a Microsoft internal keyword-spotting database showed the effectiveness of the proposed speaker supervector for text-dependent speaker verification. Moreover, when SVM backend was used in scoring, the speaker supervector achieved the best EER performance 1.627%, better than the combination of i-vector and probabilistic linear discriminant analysis.
引用
收藏
页码:5364 / 5368
页数:5
相关论文
共 50 条
  • [21] End-to-End Text-Dependent Speaker Verification
    Heigold, Georg
    Moreno, Ignacio
    Bengio, Samy
    Shazeer, Noam
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5115 - 5119
  • [22] Template-matching for text-dependent speaker verification
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    [J]. SPEECH COMMUNICATION, 2017, 88 : 96 - 105
  • [23] Constrained temporal structure for text-dependent speaker verification
    Larcher, Anthony
    Bonastre, Jean-Francois
    Mason, John S. D.
    [J]. DIGITAL SIGNAL PROCESSING, 2013, 23 (06) : 1910 - 1917
  • [24] Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
    Sarkar, Achintya Kumar
    Tan, Zheng-Hua
    Tang, Hao
    Shon, Suwon
    Glass, James
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) : 1267 - 1279
  • [25] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
    Thomsen, Nicolai Baek
    Thomsen, Dennis Alexander Lehmann
    Tan, Zheng-Hua
    Lindberg, Borge
    Jensen, Soren Holdt
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
  • [26] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
    Saeta, JR
    Hernando, J
    [J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91
  • [27] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Koshinaka, Takafumi
    Motlicek, Petr
    Madikeri, Srikanth
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
  • [28] Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification
    Ravi, Vijay
    Fan, Ruchao
    Afshan, Amber
    Lu, Huanhua
    Alwan, Abeer
    [J]. INTERSPEECH 2020, 2020, : 766 - 770
  • [29] Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals
    Avinash, B.
    Guruprasad, S.
    Yegnanarayana, B.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1073 - +
  • [30] Speaker and Channel Factors in Text-Dependent Speaker Recognition
    Stafylakis, Themos
    Kenny, Patrick
    Alam, Md. Jahangir
    Kockmann, Marcel
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78