VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification

被引:0
|
作者
Qin, Xiaoyi [1 ]
Yang, Yaogen [1 ]
Lin, Shi [1 ]
Wang, Xuyang [2 ]
Wang, Junjie [2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] AI Lab Lenovo Res, Beijing, Peoples R China
关键词
speaker verification; voices conversion; text-dependent; data augmentation;
D O I
10.1007/978-981-99-2401-1_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The deep learning based text-dependent speaker verification system generally needs a large-scale text-dependent training data set which could be both labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rate performance from 6.51% to 4.48% in the scenario of limited training data. In addition, we also explore the out-of-set and unseen speaker voice conversion based data augmentation.
引用
收藏
页码:227 / 237
页数:11
相关论文
共 50 条
  • [21] Towards Goat Detection in Text-Dependent Speaker Verification
    Toledo-Ronen, Orith
    Aronowitz, Hagai
    Hoory, Ron
    Pelecanos, Jason
    Nahamoo, David
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 16 - +
  • [22] Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification
    Sharma, Rajib
    Bhukya, Ramesh K.
    Prasanna, S. R. M.
    SPEECH COMMUNICATION, 2018, 96 : 207 - 224
  • [23] Deep Embedding Learning for Text-Dependent Speaker Verification
    Zhang, Peng
    Hu, Peng
    Zhang, Xueliang
    INTERSPEECH 2020, 2020, : 3461 - 3465
  • [24] Tandem Deep Features for Text-Dependent Speaker Verification
    Fu, Tianfan
    Qian, Yanmin
    Liu, Yuan
    Yu, Kai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1327 - 1331
  • [25] END-TO-END ATTENTION BASED TEXT-DEPENDENT SPEAKER VERIFICATION
    Zhang, Shi-Xiong
    Chen, Zhuo
    Zhao, Yong
    Li, Jinyu
    Gong, Yifan
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 171 - 178
  • [26] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
  • [27] Text-dependent Speaker Verification Using Word-based Scoring
    Yao, Shengyu
    Huang, Houjun
    Zhou, Ruohua
    Yan, Yonghong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 314 - 318
  • [28] Lexicon-Based Local Representation for Text-Dependent Speaker Verification
    You, Hanxu
    Li, Wei
    Li, Lianqiang
    Zhu, Jie
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (03): : 587 - 589
  • [29] Template-matching for text-dependent speaker verification
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    SPEECH COMMUNICATION, 2017, 88 : 96 - 105
  • [30] DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Madikeri, Srikanth
    Ferras, Marc
    Modicek, Petr
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5050 - 5054