A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model

被引:0
|
作者
Suda, Hitoshi [1 ]
Kotani, Gaku [1 ]
Takamichi, Shinnosuke [2 ]
Saito, Daisuke [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper discusses influences of handling acoustic features on the quality of generated sounds in voice conversion (VC) systems based on Gaussian mixture models (GMMs). In the context of improving the quality of VC, mapping models, which are used to convert acoustic features, have been widely discussed. Nevertheless, the components other than the mapping models have rarely been studied. The experimental results show that the quality of VC depends on not only the models but also the methods of analysis and synthesis of utterances. This paper also investigates filtering methods for synthesis. In order to avoid buzzy sounds generated from vocoders, differential-spectrum compensation is applied as an alternative method of synthesizing waveforms. Although mel log spectral approximation (MLSA) filtering is traditionally used for differential-spectrum compensation, the experimental results indicate the approximation in MLSA filtering degrades the quality of the synthesized speech. In order to avoid this approximation, this paper introduces an alternative filtering method, which is named SP-WORLD, inspired by the WORLD vocoder framework. The subjective experiments demonstrate that SP-WORLD is comparable to MLSA filtering, and outperforms it in some cases.
引用
收藏
页码:816 / 822
页数:7
相关论文
共 50 条
  • [41] High-quality protein backbone reconstruction from alpha carbons using gaussian mixture models
    Moore, Benjamin L.
    Kelley, Lawrence A.
    Barber, James
    Murray, James W.
    MacDonald, James T.
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2013, 34 (22) : 1881 - 1889
  • [42] Digital Image Forensics Based on CFA Interpolation Feature and Gaussian Mixture Model
    Wang, Xinyi
    Niu, Shaozhang
    Zhang, Jiwei
    [J]. INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2019, 11 (02) : 1 - 12
  • [43] Multi-stream Gaussian Mixture Model based Facial Feature Localization
    Kumatani, Kenichi
    Ekenel, Hazim K.
    Gao, Hua
    Stiefelhagen, Rainer
    Ercil, Aytuel
    [J]. 2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 869 - +
  • [44] Voice Activity Detection Based on Sequential Gaussian Mixture Model with Maximum Likelihood Criterion
    Shen, Zhan
    Wei, Jianguo
    Lu, Wenhuan
    Dang, Jianwu
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [45] Gaussian mixture model with feature selection: An embedded approach
    Fu, Yinlin
    Liu, Xiaonan
    Sarkar, Suryadipto
    Wu, Teresa
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [46] Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion
    Amini, Jamal
    Shahrebabaki, Abdoreza Sabzi
    Shokouhi, Navid
    Sheikhzadeh, Hamid
    Raahemifa, Kaamran
    Eslami, Mehdi
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013), 2013, : 428 - 433
  • [47] Bayesian feature and model selection for Gaussian mixture models
    Constantinopoulos, C
    Titsias, MK
    Likas, A
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (06) : 1013 - U1
  • [48] A Lesion Feature Engineering Technique Based on Gaussian Mixture Model to Detect Cervical Cancer
    Mukku, Lalasa
    Thomas, Jyothi
    [J]. FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 63 - 75
  • [49] Feature representation and discrimination based on Gaussian mixture model probability densities - Practices and algorithms
    Paalanen, Pekka
    Kamarainen, Joni-Kristian
    Iloen, Jarmo
    Kalviainen, Heikki
    [J]. PATTERN RECOGNITION, 2006, 39 (07) : 1346 - 1358
  • [50] Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
    Xue, Liumeng
    Yang, Shan
    Hu, Na
    Su, Dan
    Xie, Lei
    [J]. INTERSPEECH 2022, 2022, : 2548 - 2552