Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification

被引：9

作者：

Mingote, Victoria ^{[1
]}

Miguel, Antonio ^{[1
]}

Ortega, Alfonso ^{[1
]}

Lleida, Eduardo ^{[1
]}

机构：

[1] Univ Zaragoza, Aragon Inst Engn Res I3A, ViVoLab, Zaragoza 50018, Spain

来源：

APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 16期

关键词：

text-dependent speaker verification; HMM alignment; deep neural networks; supervectors;

D O I：

10.3390/app9163295

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.

引用

页数：12

共 50 条

[1] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
Variani, Ehsan
Lei, Xin
McDermott, Erik
Moreno, Ignacio Lopez
Gonzalez-Dominguez, Javier
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Motlicek, Petr
Madikeri, Srikanth
Ferras, Marc
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
[3] Text-dependent speaker verification system
Qin, Bing
Chen, Huipeng
Li, Guangqi
Liu, Songbo
[J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
[4] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
Laskar, Mohammad Azharuddin
Bhanja, Chuya China
Laskar, Rabul Hussain
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 5127 - 5151
[5] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
Mohammad Azharuddin Laskar
Chuya China Bhanja
Rabul Hussain Laskar
[J]. Circuits, Systems, and Signal Processing, 2021, 40 : 5127 - 5151
[6] Parallel Speaker and Content Modelling for Text-dependent Speaker Verification
Ma, Jianbo
Irtza, Saad
Sriskandaraja, Kaavya
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 435 - 439
[7] Incorporating pass-phrase dependent background models for text-dependent speaker verification
Sarkar, Achintya Kumar
Tan, Zheng-Hua
[J]. COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 259 - 271
[8] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Koshinaka, Takafumi
Motlicek, Petr
Madikeri, Srikanth
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
[9] ON INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
BERNASCONI, C
[J]. SPEECH COMMUNICATION, 1990, 9 (02) : 129 - 139
[10] Text-Dependent Speaker Verification System: A Review
Debnath, Saswati
Soni, B.
Baruah, U.
Sah, D. K.
[J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,

← 1 2 3 4 5 →