Part-Syllable Transformation-Based Voice Conversion with Very Limited Training Data

被引:0
|
作者
Mohammad Javad Jannati
Abolghasem Sayadiyan
机构
[1] Iran University of Science and Technology,School of Computer Engineering
[2] Amirkabir University of Technology,Department of Electrical Engineering
关键词
Voice conversion; Very limited training data; Part-syllable;
D O I
暂无
中图分类号
学科分类号
摘要
Voice conversion suffers from two drawbacks: requiring a large number of sentences from target speaker and concatenation error (in concatenative methods). In this research, part-syllable transformation-based voice conversion (PST-VC) method, which performs voice conversion with very limited data from a target speaker and simultaneously reduces concatenation error, is introduced. In this method, every syllable is segmented into three parts: left transition, vowel core, and right transition. Using this new language unit called part-syllable (PS), PST-VC, reduces concatenation error by transferring segmentation and concatenation from the transition points to the relatively stable points of a syllable. Since the greatest amount of information from any speaker is contained in the vowels, PST-VC method uses this information to transform the vowels into all of the language PSs. In this approach, a series of transformations are trained that can generate all of the PSs of a target speaker’s voice by receiving one vowel core as the input. Having all of the PSs, any voice of target speaker can be imitated. Therefore, PST-VC reduces the number of training sentences needed to a single-syllable word and also reduces the concatenation error.
引用
收藏
页码:1935 / 1957
页数:22
相关论文
共 50 条
  • [41] VTLN Based Approaches for Speech Recognition with Very Limited Training Speakers
    Ban, Sung Min
    Choi, Bo Kyung
    Choi, Young Ho
    Kim, Hyung Soon
    PROCEEDINGS FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2014, : 285 - 288
  • [42] VOICE VERIFICATION USING I-VECTORS AND NEURAL NETWORKS WITH LIMITED TRAINING DATA
    Mamyrbayev, O. Zh.
    Othman, M.
    Akhmediyarova, A. T.
    Kydyrbekova, A. S.
    Mekebayev, N. O.
    BULLETIN OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN, 2019, (03): : 36 - 43
  • [43] Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation
    Wu, Chung-Hsien
    Huang, Yi-Chin
    Lee, Chung-Han
    Guo, Jun-Cheng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) : 585 - 595
  • [44] Amplitude Transformation-Based Blind Equalization Part I: Suitable for High-Order PAM Signals
    Rao, Wei
    2011 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND INFORMATION APPLICATION TECHNOLOGY ESIAT 2011, VOL 10, PT B, 2011, 10 : 1276 - 1281
  • [45] Amplitude Transformation-Based Blind Equalization Part II: Suitable for High-Order QAM Signals
    Rao, Wei
    2011 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND INFORMATION APPLICATION TECHNOLOGY ESIAT 2011, VOL 10, PT B, 2011, 10 : 1282 - 1286
  • [46] A robust transformation-based learning approach using ripple down rules for part-of-speech tagging
    Dat Quoc Nguyen
    Dai Quoc Nguyen
    Dang Duc Pham
    Son Bao Pham
    AI COMMUNICATIONS, 2016, 29 (03) : 409 - 422
  • [47] VOICE-TRANSFORMATION-BASED DATA AUGMENTATION FOR PROSODIC CLASSIFICATION
    Fernandez, Raul
    Rosenberg, Andrew
    Sorin, Alexander
    Ramabhadran, Bhuvana
    Hoory, Ron
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5530 - 5534
  • [48] An improved spectral and prosodic transformation method in straight-based voice conversion
    Qin, L
    Chen, GP
    Ling, ZH
    Dai, LR
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 21 - 24
  • [49] Voice Conversion Based on Em pirical Conditional Distribution in Resource-limited Scenarios
    Xu, Ning
    Tang, Yibin
    Bao, Jingyi
    Yao, Xiao
    Jiang, Aimin
    Liu, Xiaofeng
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 172 - 173
  • [50] Emotional Voice Conversion with Adaptive Scales F0 based on Wavelet Transform using Limited Amount of Emotional Data
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3399 - 3403