Improving the performance of MGM-based voice conversion by preparing training data method

被引:0
|
作者
Zuo, GY [1 ]
Liu, WJ [1 ]
Ruan, XG [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral features representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/UV) decision scheme for time-alignment is provided to obtain the right data for training mixture Gaussian spectral mapping function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods and V/UV decisions strategies to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.
引用
收藏
页码:181 / 184
页数:4
相关论文
共 50 条
  • [1] Improving the Performance of GMM Based Voice Conversion Method
    Song, Peng
    Zhao, Li
    [J]. PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 436 - 440
  • [2] Voice conversion based on feature combination with limited training data
    Ghorbandoost, Mostafa
    Sayadiyan, Abolghasem
    Ahangar, Mohsen
    Sheikhzadeh, Hamid
    Shahrebabaki, Abdoreza Sabzi
    Amini, Jamal
    [J]. SPEECH COMMUNICATION, 2015, 67 : 113 - 128
  • [3] Improving the Efficiency of Dysarthria Voice Conversion System Based on Data Augmentation
    Zheng, Wei-Zhong
    Han, Ji-Yan
    Chen, Chen-Yu
    Chang, Yuh-Jer
    Lai, Ying-Hui
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 4613 - 4623
  • [4] Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
    Xu, Ning
    Tang, Yibing
    Bao, Jingyi
    Jiang, Aiming
    Liu, Xiaofeng
    Yang, Zhen
    [J]. SPEECH COMMUNICATION, 2014, 58 : 124 - 138
  • [5] NON-PARALLEL TRAINING FOR VOICE CONVERSION BASED ON ADAPTATION METHOD
    Song, Peng
    Zheng, Wenming
    Zhao, Li
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6905 - 6909
  • [6] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
    Jiao, Yishan
    Xie, Xiang
    Na, Xingyu
    Tu, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Improving Segmental GMM Based Voice Conversion Method with Target Frame Selection
    Gu, Hung-Yan
    Tsai, Sung-Fung
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 483 - 487
  • [8] DeepConversion: Voice conversion with limited parallel training data
    Zhang, Mingyang
    Sisman, Berrak
    Zhao, Li
    Li, Haizhou
    [J]. SPEECH COMMUNICATION, 2020, 122 (122) : 31 - 43
  • [9] WaveNet Vocoder with Limited Training Data for Voice Conversion
    Liu, Li-Juan
    Ling, Zhen-Hua
    Yuan-Jiang
    Ming-Zhou
    Dai, Li-Rong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1983 - 1987
  • [10] Adaptive Training for Voice Conversion Based on Eigenvoices
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (06): : 1589 - 1598