Voice conversion based on matrix variate Gaussian mixture model using multiple frame features

被引:0
|
作者
Yang, Yi [1 ]
Uchida, Hidetsugu [1 ]
Saito, Daisuke [1 ]
Minematsu, Nobuaki [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
关键词
voice conversion; Gaussian mixture model; matrix variate Gaussian mixture model; multiple frame features;
D O I
10.21437/Interspeech.2016-705
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel voice conversion method based on matrix variate Gaussian mixture model (MV-GMM) using features of multiple frames. In voice conversion studies, approaches based on Gaussian mixture models (GMM) are still widely utilized because of their flexibility and easiness in handling. They treat the joint probability density function (PDF) of feature vectors from source and target speakers as that of joint vectors of the two vectors. Addition of dynamic features to the feature vectors in GMM-based approaches achieves certain performance improvements because the correlation between multiple frames is taken into account. Recently, a voice conversion framework based on MV-GMM, in which the joint PDF is modeled in a matrix variate space, has been proposed and it is able to precisely model both the characteristics of the feature spaces and the relation between the source and target speakers. In this paper, in order to additionally model the correlation between multiple frames in the framework more consistently, MV-GMM is constructed in a matrix variate space containing the features of neighboring frames. Experimental results show that an certain performance improvement in both objective and subjective evaluations is observed.
引用
收藏
页码:302 / 306
页数:5
相关论文
共 50 条
  • [1] VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL
    Saito, Daisuke
    Doi, Hidenobu
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 567 - 571
  • [2] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43
  • [3] Voice Conversion Using Structrued Gaussian Mixture Model
    Zeng, Daojian
    Yu, Yibiao
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 541 - 544
  • [4] Voice conversion using canonical correlation analysis based on Gaussian mixture model
    Jian, ZhiHua
    Yang, Zhen
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 210 - +
  • [5] Voice conversion algorithm using phoneme Gaussian mixture model
    Sheng, L
    Yin, JX
    Huang, JC
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 5 - 8
  • [6] Voice Conversion Using Gaussian Mixture Models
    D'souza, Kevin
    Talele, K. T. V.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMMUNICATION, INFORMATION & COMPUTING TECHNOLOGY (ICCICT), 2015,
  • [7] Voice conversion using structured Gaussian mixture model in cepstrum eigenspace
    LI Yangchun
    YU Yibiao
    [J]. Chinese Journal of Acoustics, 2015, 34 (03) : 325 - 336
  • [8] Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model
    Nguyen, Binh Phu
    Akagi, Masato
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 222 - 227
  • [9] Efficient Gaussian Mixture Model Evaluation in Voice Conversion
    Tian, Jilei
    Nurminen, Jani
    Popa, Victor
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2282 - 2285
  • [10] ON USING NON-LINEAR CANONICAL CORRELATION ANALYSIS FOR VOICE CONVERSION BASED ON GAUSSIAN MIXTURE MODEL
    Jian Zhihua Yang Zhen(School of Communication Engineering
    [J]. Journal of Electronics(China), 2010, 27 (01) : 1 - 7