EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Chen, Liping ^{[1
]}

Zhao, Yong ^{[2
]}

Zhang, Shi-Xiong ^{[2
]}

Li, Jie ^{[1
]}

Ye, Guoli ^{[2
]}

Soong, Frank ^{[3
]}

机构：

[1] Microsoft Search Technol Ctr Asia, Beijing, Peoples R China

[2] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA

[3] Microsoft Res Asia, Beijing, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Text-dependent speaker verification; sequential speaker characteristics; speaker supervector; dynamic time warping; VARIABILITY;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, given the speaker bottleneck feature vectors extracted with speaker discriminant neural networks, we focus on using the sequential speaker characteristics for text-dependent speaker verification. In each evaluation trial, speaker supervectors are used as the representations of the sequential speaker characteristics rendered in the compared speech utterances. To this end, dynamic time warping is used to warp the variable-length speaker feature vector sequences of the utterances to the same length. Thereafter for every utterance, a speaker supervector can be obtained as the concatenation of its speaker feature vectors. We use Euclidean distance and support vector machine (SVM) to compute the decision score on the speaker supervectors. Our experiments on a Microsoft internal keyword-spotting database showed the effectiveness of the proposed speaker supervector for text-dependent speaker verification. Moreover, when SVM backend was used in scoring, the speaker supervector achieved the best EER performance 1.627%, better than the combination of i-vector and probabilistic linear discriminant analysis.

引用

页码：5364 / 5368

页数：5

共 50 条

[21] End-to-End Text-Dependent Speaker Verification
Heigold, Georg
Moreno, Ignacio
Bengio, Samy
Shazeer, Noam
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5115 - 5119
[22] Template-matching for text-dependent speaker verification
Dey, Subhadeep
Motlicek, Petr
Madikeri, Srikanth
Ferras, Marc
[J]. SPEECH COMMUNICATION, 2017, 88 : 96 - 105
[23] Constrained temporal structure for text-dependent speaker verification
Larcher, Anthony
Bonastre, Jean-Francois
Mason, John S. D.
[J]. DIGITAL SIGNAL PROCESSING, 2013, 23 (06) : 1910 - 1917
[24] Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
Sarkar, Achintya Kumar
Tan, Zheng-Hua
Tang, Hao
Shon, Suwon
Glass, James
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) : 1267 - 1279
[25] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
Thomsen, Nicolai Baek
Thomsen, Dennis Alexander Lehmann
Tan, Zheng-Hua
Lindberg, Borge
Jensen, Soren Holdt
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
[26] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
Saeta, JR
Hernando, J
[J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91
[27] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Koshinaka, Takafumi
Motlicek, Petr
Madikeri, Srikanth
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
[28] Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification
Ravi, Vijay
Fan, Ruchao
Afshan, Amber
Lu, Huanhua
Alwan, Abeer
[J]. INTERSPEECH 2020, 2020, : 766 - 770
[29] Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals
Avinash, B.
Guruprasad, S.
Yegnanarayana, B.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1073 - +
[30] Speaker and Channel Factors in Text-Dependent Speaker Recognition
Stafylakis, Themos
Kenny, Patrick
Alam, Md. Jahangir
Kockmann, Marcel
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78

← 1 2 3 4 5 →