Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification

被引:9
|
作者
Mingote, Victoria [1 ]
Miguel, Antonio [1 ]
Ortega, Alfonso [1 ]
Lleida, Eduardo [1 ]
机构
[1] Univ Zaragoza, Aragon Inst Engn Res I3A, ViVoLab, Zaragoza 50018, Spain
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 16期
关键词
text-dependent speaker verification; HMM alignment; deep neural networks; supervectors;
D O I
10.3390/app9163295
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
    Variani, Ehsan
    Lei, Xin
    McDermott, Erik
    Moreno, Ignacio Lopez
    Gonzalez-Dominguez, Javier
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
  • [3] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    [J]. Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [4] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
    Laskar, Mohammad Azharuddin
    Bhanja, Chuya China
    Laskar, Rabul Hussain
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 5127 - 5151
  • [5] Speaker-Phrase-Specific Adaptation of PLDA Model for Improved Performance in Text-Dependent Speaker Verification
    Mohammad Azharuddin Laskar
    Chuya China Bhanja
    Rabul Hussain Laskar
    [J]. Circuits, Systems, and Signal Processing, 2021, 40 : 5127 - 5151
  • [6] Parallel Speaker and Content Modelling for Text-dependent Speaker Verification
    Ma, Jianbo
    Irtza, Saad
    Sriskandaraja, Kaavya
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 435 - 439
  • [7] Incorporating pass-phrase dependent background models for text-dependent speaker verification
    Sarkar, Achintya Kumar
    Tan, Zheng-Hua
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 259 - 271
  • [8] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Koshinaka, Takafumi
    Motlicek, Petr
    Madikeri, Srikanth
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
  • [9] ON INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    BERNASCONI, C
    [J]. SPEECH COMMUNICATION, 1990, 9 (02) : 129 - 139
  • [10] Text-Dependent Speaker Verification System: A Review
    Debnath, Saswati
    Soni, B.
    Baruah, U.
    Sah, D. K.
    [J]. PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,