VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification

被引:0
|
作者
Qin, Xiaoyi [1 ]
Yang, Yaogen [1 ]
Lin, Shi [1 ]
Wang, Xuyang [2 ]
Wang, Junjie [2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] AI Lab Lenovo Res, Beijing, Peoples R China
关键词
speaker verification; voices conversion; text-dependent; data augmentation;
D O I
10.1007/978-981-99-2401-1_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The deep learning based text-dependent speaker verification system generally needs a large-scale text-dependent training data set which could be both labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rate performance from 6.51% to 4.48% in the scenario of limited training data. In addition, we also explore the out-of-set and unseen speaker voice conversion based data augmentation.
引用
收藏
页码:227 / 237
页数:11
相关论文
共 50 条
  • [41] Tandem Features for Text-dependent Speaker Verification on the RedDots Corpus
    Alam, Md Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 420 - 424
  • [42] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [43] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
    Liu, Ming
    Huang, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
  • [44] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Zhao, Yong
    Zhang, Shi-Xiong
    Li, Jie
    Ye, Guoli
    Soong, Frank
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
  • [45] ON INSTANTANEOUS AND TRANSITIONAL SPECTRAL INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    BERNASCONI, C
    SPEECH COMMUNICATION, 1990, 9 (02) : 129 - 139
  • [46] Addressing Text-Dependent Speaker Verification Using Singing Speech
    Shi, Yan
    Zhou, Juanjuan
    Long, Yanhua
    Li, Yijie
    Mao, Hongwei
    APPLIED SCIENCES-BASEL, 2019, 9 (13):
  • [47] Unsupervised Data-driven Hidden Markov Modeling for Text-dependent Speaker Verification
    Petrovska-Delacretaz, Dijana
    Khemiri, Houssemeddine
    ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 199 - 207
  • [48] EFFECTS OF GENDER INFORMATION IN TEXT-INDEPENDENT AND TEXT-DEPENDENT SPEAKER VERIFICATION
    Kanervisto, Anssi
    Vestman, Ville
    Sahidullah, Md
    Hautamaki, Ville
    Kinnunen, Tomi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5360 - 5364
  • [49] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
    Saeta, JR
    Hernando, J
    NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91
  • [50] A Phonetic Alternative to Cross-language Voice Conversion in a Text-dependent Context: Evaluation of Speaker Identity
    Yanagisawa, Kayoko
    Huckvale, Mark
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2150 - 2153