Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model

被引:0
|
作者
Nguyen, Binh Phu [1 ]
Akagi, Masato [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa 9231292, Japan
关键词
spectral voice conversion; temporal decomposition; Gaussian mixture model (GMM);
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In state-of-the-art voice conversion systems, GMM-based voice conversion methods are regarded as some of the best systems. However, the quality of converted speech is still far from natural. There are three main reasons for the degradation of the quality of converted speech: (i) modeling the distribution of acoustic features in voice conversion often uses unstable frames, which degrades the precision of GMM parameters (ii) the transformation function may generate discontinuous features if frames are processed independently (iii) over-smooth effect occurs in each converted frame. This paper presents a new spectral voice conversion method to deal with the two first drawbacks of standard spectral modification methods, insufficient precision of GMM parameters and insufficient smoothness of the converted spectra between frames. A speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to effectively model the spectral evolution. For improvement of estimation of GMM parameters, we use phoneme-based features of event targets as spectral vectors in training procedure to take into account relations between spectral parameters in each phoneme, and to avoid using spectral parameters in transition parts. For enhancement of the continuity of speech spectra, we only need to convert event targets, instead of converting source features to target features frame by frame, and the smoothness of converted speech is ensured by the shape of the event functions. Experimental results show that our proposed spectral voice conversion method improves both the speech quality and the speaker individuality of converted speech.
引用
收藏
页码:222 / 227
页数:6
相关论文
共 50 条
  • [1] Voice conversion algorithm using phoneme Gaussian mixture model
    Sheng, L
    Yin, JX
    Huang, JC
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 5 - 8
  • [2] Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck
    Lee, Sang-Hoon
    Noh, Hyeong-Rae
    Nam, Woo-Jeoung
    Lee, Seong-Whan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1173 - 1183
  • [3] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43
  • [4] A Flexible Spectral Modification Method based on Temporal Decomposition and Gaussian Mixture Model
    Nguyen, Binh Phu
    Akagi, Masato
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 597 - 600
  • [5] A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
    Binh Phu Nguyen
    Akagi, Masato
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2009, 30 (03) : 170 - 179
  • [6] Voice Conversion Using Structrued Gaussian Mixture Model
    Zeng, Daojian
    Yu, Yibiao
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 541 - 544
  • [7] Voice conversion based on Gaussian processes by using kernels modeling the spectral density with Gaussian mixture models
    Bao, Jingyi
    Xu, Ning
    [J]. MODERN PHYSICS LETTERS B, 2018, 32 (34-36):
  • [8] Voice conversion using canonical correlation analysis based on Gaussian mixture model
    Jian, ZhiHua
    Yang, Zhen
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 210 - +
  • [9] VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL
    Saito, Daisuke
    Doi, Hidenobu
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 567 - 571
  • [10] Voice Conversion Based on Gaussian Mixture Modules with Minimum Distance Spectral Mapping
    Jin, Gui
    Johnson, Michael T.
    Liu, Jia
    Lin, Xiaokang
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2015, : 356 - 359