Improving the performance of MGM-based voice conversion by preparing training data method

被引:0
|
作者
Zuo, GY [1 ]
Liu, WJ [1 ]
Ruan, XG [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral features representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/UV) decision scheme for time-alignment is provided to obtain the right data for training mixture Gaussian spectral mapping function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods and V/UV decisions strategies to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.
引用
收藏
页码:181 / 184
页数:4
相关论文
共 50 条
  • [21] Voice conversion with SI-DNN and KL divergence based mapping without parallel training data
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    [J]. SPEECH COMMUNICATION, 2019, 106 : 57 - 67
  • [22] AN IMPROVED FRAME-UNIT-SELECTION BASED VOICE CONVERSION SYSTEM WITHOUT PARALLEL TRAINING DATA
    Xie, Feng-Long
    Li, Xin-Hui
    Liu, Bo
    Zheng, Yi-Bin
    Meng, Li
    Lu, Li
    Soong, Frank K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7754 - 7758
  • [23] Discriminative training for improving Letter-to-Sound conversion performance
    Chen, Yi-Ning
    Liu, Peng
    You, Jia-Li
    Soong, Frank K.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4649 - 4652
  • [24] A GMM based residual prediction method for voice conversion
    Xia, J
    Yin, JX
    [J]. ISPACS 2005: PROCEEDINGS OF THE 2005 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, 2005, : 389 - 392
  • [25] A ANN BASED HIGH QUALITY METHOD FOR VOICE CONVERSION
    Chen, Z.
    Zhang, L. H.
    [J]. 2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [26] Improving supervised learning performance by using fuzzy clustering method to select training data
    Guan, Donghai
    Yuan, Weiwei
    Lee, Young-Koo
    Gavrilov, Andrey
    Lee, Sungyoung
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2008, 19 (4-5) : 321 - 334
  • [27] Cost reduction of training mapping function based on multistep voice conversion
    Masuda, Tsuyoshi
    Shozakai, Makoto
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 693 - +
  • [28] Novel Method for Data Clustering and Mode Selection with Application in Voice Conversion
    Nurminen, Jani
    Tian, Jilei
    Popa, Victor
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2258 - 2261
  • [29] An Improved ANN Method Based on Clustering Optimization for Voice Conversion
    Chen Xiantong
    Zhang Linghua
    [J]. 2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 464 - 469
  • [30] Frame Correlation Based Autoregressive GMM Method for Voice Conversion
    Li, Xian
    Wang, Zeng-fu
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 221 - 225