CNN-based joint mapping of short and long utterance i-vectors for speaker verification using short utterances

被引:8
|
作者
Guo, Jinxi [1 ]
Nookala, Usha Amrutha [1 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
关键词
speaker verification; text-independent; short utterances; i-vectors; CNNs; joint modeling; PLDA;
D O I
10.21437/Interspeech.2017-430
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel method. which trains a convolutional neural network (CNN) model to map the i-vectors extracted from short utterances to the corresponding long-utterance i-vectors. In order to simultaneously learn the representation of the original short-utterance i-vectors and fit the target long-version i-vectors. we jointly train a supervised-regression model with an autoencoder using CNNs. The trained CNN model is then used to generate the mapped version of short-utterance i-vectors in the evaluation stage. We compare our proposed CNN based joint mapping method with a GMM-based joint modeling method under matched and mismatched PLDA training conditions. Experimental results using the NIST SRE 2008 dataset show that the proposed technique achieves up to 30% relative improvement under duration mismatched PLDA-training conditions and outperforms the GMM-based method. The improved systems also perform better compared with the matched-length PLDA training condition using short utterances.
引用
收藏
页码:3712 / 3716
页数:5
相关论文
共 50 条
  • [1] Probabilistic approach using joint long and short session i-vectors modeling to deal with short utterances for speaker recognition
    Ben Kheder, Waad
    Matrouf, Driss
    Ajili, Moez
    Bonastre, Jean-Francois
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1830 - 1834
  • [2] I-VECTORS IN THE CONTEXT OF PHONETICALLY-CONSTRAINED SHORT UTTERANCES FOR SPEAKER VERIFICATION
    Larcher, Anthony
    Bousquet, Pierre-Michel
    Lee, Kong Aik
    Matrouf, Driss
    Li, Haizhou
    Bonastre, Jean-Francois
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4773 - 4776
  • [3] I-VECTORS IN THE CONTEXT OF PHONETICALLY-CONSTRAINED SHORT UTTERANCES FOR SPEAKER VERIFICATION
    Larcher, Anthony
    Bousquet, Pierre-Michel
    Lee, Kong Aik
    Matrouf, Driss
    Li, Haizhou
    Bonastre, Jean-Francois
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4773 - 4776
  • [4] Co-whitening of i-vectors for short and long duration speaker verification
    Xu, Longting
    Lee, Kong Aik
    Li, Haizhou
    Yang, Zhen
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1066 - 1070
  • [5] Duration compensation of i-vectors for short duration speaker verification
    Ma, Jianbo
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Lee, Kong Aik
    ELECTRONICS LETTERS, 2017, 53 (06) : 405 - 407
  • [6] APPLYING COMPENSATION TECHNIQUES ON I-VECTORS EXTRACTED FROM SHORT-TEST UTTERANCES FOR SPEAKER VERIFICATION USING DEEP NEURAL NETWORK
    Yang, Il-Ho
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5490 - 5494
  • [7] Emotional Speaker Verification Based on I-vectors
    Mackova, Lenka
    Cizmar, Anton
    2014 5TH IEEE CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM), 2014, : 533 - 536
  • [8] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91
  • [9] Deep neural network based i-vector mapping for speaker verification using short utterances
    Guo, Jinxi
    Xu, Ning
    Qian, Kailun
    Shi, Yang
    Xu, Kaiyuan
    Wu, Yingnian
    Alwan, Abeer
    SPEECH COMMUNICATION, 2018, 105 : 92 - 102
  • [10] Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification
    Khoury, Elie
    Kinnunen, Tomi
    Sizov, Aleksandr
    Wu, Zhizheng
    Marcel, Sebastien
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 61 - 65