CNN-based joint mapping of short and long utterance i-vectors for speaker verification using short utterances

被引:8
|
作者
Guo, Jinxi [1 ]
Nookala, Usha Amrutha [1 ]
Alwan, Abeer [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect Engn, Los Angeles, CA 90095 USA
关键词
speaker verification; text-independent; short utterances; i-vectors; CNNs; joint modeling; PLDA;
D O I
10.21437/Interspeech.2017-430
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel method. which trains a convolutional neural network (CNN) model to map the i-vectors extracted from short utterances to the corresponding long-utterance i-vectors. In order to simultaneously learn the representation of the original short-utterance i-vectors and fit the target long-version i-vectors. we jointly train a supervised-regression model with an autoencoder using CNNs. The trained CNN model is then used to generate the mapped version of short-utterance i-vectors in the evaluation stage. We compare our proposed CNN based joint mapping method with a GMM-based joint modeling method under matched and mismatched PLDA training conditions. Experimental results using the NIST SRE 2008 dataset show that the proposed technique achieves up to 30% relative improvement under duration mismatched PLDA-training conditions and outperforms the GMM-based method. The improved systems also perform better compared with the matched-length PLDA training condition using short utterances.
引用
收藏
页码:3712 / 3716
页数:5
相关论文
共 50 条
  • [41] I-vector-based Speaker Identification with Extremely Short Utterances for Both Training and Testing
    Tsujikawa, Misaki
    Nishikawa, Tsuyoki
    Matsui, Tomoko
    2017 IEEE 6TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2017,
  • [42] Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors
    Maghsoodi, Nooshin
    Sameti, Hossein
    Zeinal, Hossein
    Stafylakis, Themos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1815 - 1825
  • [43] Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems
    Park, Soo Jin
    Yeung, Gary
    Kreiman, Jody
    Keating, Patricia A.
    Alwan, Abeer
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1522 - 1526
  • [44] Integration of MKL-based and i-vector-based speaker verification by short
    Hino, Hideitsu
    Ogawa, Tetsuji
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 562 - 566
  • [45] End-to-end DNN based text-independent speaker recognition for long and short utterances
    Rohdin, Johan
    Silnova, Anna
    Diez, Mireia
    Plchot, Oldrich
    Matejka, Pavel
    Burget, Lukas
    Glembek, Ondrej
    COMPUTER SPEECH AND LANGUAGE, 2020, 59 : 22 - 35
  • [46] Effective preservation of higher-frequency contents in the context of short utterance based children's speaker verification system
    Aziz, Shahid
    Shahnawazuddin, S.
    APPLIED ACOUSTICS, 2023, 209
  • [47] CNN-based fNIRS signal quality assessment using short-time Fourier transform with short segment signals
    Ilvesmaki, Martti
    Ferdinando, Hany
    Tone, Patricia-Elena
    Noponen, Kai
    Paunonen, Jari
    Seppanen, Tapio
    Myllyla, Teemu
    TISSUE OPTICS AND PHOTONICS III, 2024, 13010
  • [48] Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias
    Sang, Mufan
    Xia, Wei
    Hansen, John H. L.
    INTERSPEECH 2020, 2020, : 2262 - 2266
  • [49] Fisher ratio-based multi-domain frame-level feature aggregation for short utterance speaker verification
    Zi, Yunfei
    Xiong, Shengwu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [50] SPEAKER DIARIZATION OF BROADCAST STREAMS USING TWO-STAGE CLUSTERING BASED ON I-VECTORS AND COSINE DISTANCE SCORING
    Silovsky, Jan
    Prazak, Jan
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4193 - 4196