Part-Syllable Transformation-Based Voice Conversion with Very Limited Training Data

被引:0
|
作者
Mohammad Javad Jannati
Abolghasem Sayadiyan
机构
[1] Iran University of Science and Technology,School of Computer Engineering
[2] Amirkabir University of Technology,Department of Electrical Engineering
关键词
Voice conversion; Very limited training data; Part-syllable;
D O I
暂无
中图分类号
学科分类号
摘要
Voice conversion suffers from two drawbacks: requiring a large number of sentences from target speaker and concatenation error (in concatenative methods). In this research, part-syllable transformation-based voice conversion (PST-VC) method, which performs voice conversion with very limited data from a target speaker and simultaneously reduces concatenation error, is introduced. In this method, every syllable is segmented into three parts: left transition, vowel core, and right transition. Using this new language unit called part-syllable (PS), PST-VC, reduces concatenation error by transferring segmentation and concatenation from the transition points to the relatively stable points of a syllable. Since the greatest amount of information from any speaker is contained in the vowels, PST-VC method uses this information to transform the vowels into all of the language PSs. In this approach, a series of transformations are trained that can generate all of the PSs of a target speaker’s voice by receiving one vowel core as the input. Having all of the PSs, any voice of target speaker can be imitated. Therefore, PST-VC reduces the number of training sentences needed to a single-syllable word and also reduces the concatenation error.
引用
收藏
页码:1935 / 1957
页数:22
相关论文
共 50 条
  • [31] Voice conversion with SI-DNN and KL divergence based mapping without parallel training data
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    SPEECH COMMUNICATION, 2019, 106 : 57 - 67
  • [32] AN IMPROVED FRAME-UNIT-SELECTION BASED VOICE CONVERSION SYSTEM WITHOUT PARALLEL TRAINING DATA
    Xie, Feng-Long
    Li, Xin-Hui
    Liu, Bo
    Zheng, Yi-Bin
    Meng, Li
    Lu, Li
    Soong, Frank K.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7754 - 7758
  • [33] Part transformation-based spare parts inventory control model for the high-tech industries
    Gucdemir, Hulya
    Tasoglu, Gokcecicek
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING COMPUTATIONS, 2024, 15 (01) : 307 - 326
  • [34] An Evaluation of Cross-Language Adaptation and Native Speech Training for Rapid HMM Construction Based on Very Limited Training Data
    Zhao, Xufang
    O'Shaughnessy, Douglas
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 309 - 312
  • [35] Vocal Tract Spectrum Transformation Based on Clustering in Voice Conversion System
    Xie Weichao
    Zhang Linghua
    PROCEEDING OF THE IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2012, : 236 - 240
  • [36] LANGUAGE RECOGNITION USING DEEP NEURAL NETWORKS WITH VERY LIMITED TRAINING DATA
    Ranjan, Shivesh
    Yu, Chengzhu
    Zhang, Chunlei
    Kelly, Finnian
    Hansen, John H. L.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5830 - 5834
  • [37] Modelling Electrode Wear in an EDM Process Using Data Transformation-based Polynomial and GLM
    Al-Ghamdi, Khalid A.
    2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND OPERATIONS MANAGEMENT (IEOM), 2015,
  • [38] Similarity transformation-based analysis of atmospheric models, data, and inverse remote sensing algorithms
    Meier, RR
    Picone, JM
    Drob, DP
    Roble, RG
    JOURNAL OF GEOPHYSICAL RESEARCH-SPACE PHYSICS, 2001, 106 (A8) : 15519 - 15532
  • [39] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON ADAPTATION METHOD
    Song, Peng
    Zheng, Wenming
    Zhao, Li
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6905 - 6909
  • [40] Cost reduction of training mapping function based on multistep voice conversion
    Masuda, Tsuyoshi
    Shozakai, Makoto
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 693 - +