VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification

被引:0
|
作者
Qin, Xiaoyi [1 ]
Yang, Yaogen [1 ]
Lin, Shi [1 ]
Wang, Xuyang [2 ]
Wang, Junjie [2 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] AI Lab Lenovo Res, Beijing, Peoples R China
关键词
speaker verification; voices conversion; text-dependent; data augmentation;
D O I
10.1007/978-981-99-2401-1_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The deep learning based text-dependent speaker verification system generally needs a large-scale text-dependent training data set which could be both labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rate performance from 6.51% to 4.48% in the scenario of limited training data. In addition, we also explore the out-of-set and unseen speaker voice conversion based data augmentation.
引用
收藏
页码:227 / 237
页数:11
相关论文
共 50 条
  • [1] Data Augmentation Enhanced Speaker Enrollment for Text-dependent Speaker Verification
    Sarkar, Achintya Kumar
    Sarma, Himangshu
    Dwivedi, Priyanka
    Tan, Zheng-Hua
    2020 3RD INTERNATIONAL CONFERENCE ON ENERGY, POWER AND ENVIRONMENT: TOWARDS CLEAN ENERGY TECHNOLOGIES (ICEPE 2020), 2021,
  • [2] On the study of replay and voice conversion attacks to text-dependent speaker verification
    Wu, Zhizheng
    Li, Haizhou
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5311 - 5327
  • [3] On the study of replay and voice conversion attacks to text-dependent speaker verification
    Zhizheng Wu
    Haizhou Li
    Multimedia Tools and Applications, 2016, 75 : 5311 - 5327
  • [4] SYNAUG: SYNTHESIS-BASED DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Du, Chenpeng
    Han, Bing
    Wang, Shuai
    Qian, Yanmin
    Yu, Kai
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5844 - 5848
  • [5] KNOWLEDGE DISTILLATION AND RANDOM ERASING DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Mingote, Victoria
    Miguel, Antonio
    Ribas, Dayana
    Ortega, Alfonso
    Lleida, Eduardo
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6824 - 6828
  • [6] Voice Transformation-based Spoofing of Text-Dependent Speaker Verification Systems
    Kons, Zvi
    Aronowitz, Hagai
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 945 - 949
  • [7] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [8] Text-Dependent Speaker Verification System: A Review
    Debnath, Saswati
    Soni, B.
    Baruah, U.
    Sah, D. K.
    PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [9] Deep feature for text-dependent speaker verification
    Liu, Yuan
    Qian, Yanmin
    Chen, Nanxin
    Fu, Tianfan
    Zhang, Ya
    Yu, Kai
    SPEECH COMMUNICATION, 2015, 73 : 1 - 13
  • [10] ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chowdhury, F. A. Rezaur Rahman
    Wang, Quan
    Moreno, Ignacio Lopez
    Wan, Li
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5359 - 5363