CNN-based joint mapping of short and long utterance i-vectors for speaker verification using short utterances

被引:8
|
作者
Guo, Jinxi [1 ]
Nookala, Usha Amrutha [1 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
关键词
speaker verification; text-independent; short utterances; i-vectors; CNNs; joint modeling; PLDA;
D O I
10.21437/Interspeech.2017-430
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel method. which trains a convolutional neural network (CNN) model to map the i-vectors extracted from short utterances to the corresponding long-utterance i-vectors. In order to simultaneously learn the representation of the original short-utterance i-vectors and fit the target long-version i-vectors. we jointly train a supervised-regression model with an autoencoder using CNNs. The trained CNN model is then used to generate the mapped version of short-utterance i-vectors in the evaluation stage. We compare our proposed CNN based joint mapping method with a GMM-based joint modeling method under matched and mismatched PLDA training conditions. Experimental results using the NIST SRE 2008 dataset show that the proposed technique achieves up to 30% relative improvement under duration mismatched PLDA-training conditions and outperforms the GMM-based method. The improved systems also perform better compared with the matched-length PLDA training condition using short utterances.
引用
收藏
页码:3712 / 3716
页数:5
相关论文
共 50 条
  • [31] Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification
    Sarkar, A. K.
    Matrouf, D.
    Bousquet, P. M.
    Bonastre, J. F.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2661 - 2664
  • [32] Combining Amplitude and Phase-based Features for Speaker Verification with Short Duration Utterances
    Alam, Md Jahangir
    Kenny, Patrick
    Stafylakis, Themos
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 249 - 253
  • [33] Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models
    Zeinali, Hossein
    Sameti, Hossein
    Burget, Lukas
    Cernocky, Jan Honza
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 53 - 71
  • [34] Improving Short Utterance based I-vector Speaker Recognition using Source and Utterance-Duration Normalization Techniques
    Kanagasundaram, A.
    Dean, D.
    Gonzalez-Dominguez, J.
    Sridharan, S.
    Ramos, D.
    Gonzalez-Rodriguez, J.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2464 - 2468
  • [35] Experimental studies for improving the performance of children's speaker verification system using short utterances
    Aziz, Shahid
    Shahnawazuddin, S.
    APPLIED ACOUSTICS, 2024, 216
  • [36] Short-Utterance-Based Children’s Speaker Verification in Low-Resource Conditions
    Shahid Aziz
    S. Ankita
    Circuits, Systems, and Signal Processing, 2024, 43 : 1715 - 1740
  • [37] Short-Utterance-Based Children's Speaker Verification in Low-Resource Conditions
    Aziz, Shahid
    Ankita
    Shahnawazuddin, S.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 43 (3) : 1715 - 1740
  • [38] SHORT UTTERANCE COMPENSATION IN SPEAKER VERIFICATION VIA COSINE-BASED TEACHER-STUDENT LEARNING OF SPEAKER EMBEDDINGS
    Jung, Jee-weon
    Heo, Hee-Soo
    Shim, Hye-jin
    Yu, Ha-Jin
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 335 - 341
  • [39] SPEAKER AGE ESTIMATION ON CONVERSATIONAL TELEPHONE SPEECH USING SENONE POSTERIOR BASED I-VECTORS
    Sadjadi, Seyed Omid
    Ganapathy, Sriram
    Pelecanos, Jason W.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5040 - 5044
  • [40] A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems
    Kanagasundaram A.
    Dean D.
    Sridharan S.
    Ghaemmaghami H.
    Fookes C.
    International Journal of Speech Technology, 2017, 20 (2) : 247 - 259