MODULATION SPECTRUM-CONSTRAINED TRAJECTORY TRAINING ALGORITHM FOR GMM-BASED VOICE CONVERSION

被引:0
|
作者
Takamichi, Shinnosuke [1 ,2 ]
Toda, Tomoki [1 ]
Black, Alan W. [2 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol NAIST, Grad Sch Informat Sci, Ikoma, Japan
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
GMM-based voice conversion; over-smoothing; modulation spectrum. training algorithm;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel training algorithm for Gaussian Mixture Model (GMM) -based Voice Conversion (VC). One of the advantages of GMM-based VC is computationally efficient conversion processing enabling to achieve real-time VC applications. On the other hand, the quality of the converted speech is still significantly worse than that of natural speech. In order to address this problem while preserving the computationally efficient conversion processing, the proposed training method enables 1) to use a consistent optimization criterion between training and conversion and 2) to compensate a Modulation Spectrum (MS) of the converted parameter trajectory as a feature sensitively correlated with over-smoothing effects causing quality degradation of the converted speech. The experimental results demonstrate that the proposed algorithm yields significant improvements in term of both the converted speech quality and the conversion accuracy for speaker individuality compared to the basic training algorithm.
引用
收藏
页码:4859 / 4863
页数:5
相关论文
共 50 条
  • [1] Modulation Spectrum-Constrained Trajectory Training Algorithm for HMM-Based Speech Synthesis
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1206 - 1210
  • [2] Modulation Spectrum-Based Post-Filter for GMM-Based Voice Conversion
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [3] Incorporating Global Variance in the Training Phase of GMM-based Voice Conversion
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [4] Modulation spectrum-constrained trajectory error training for mixture density network-based speech synthesis
    Park, Sangjun
    Hahn, Minsoo
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 144 (03): : EL151 - EL157
  • [5] Alleviating the Over-Smoothing Problem in GMM-Based Voice Conversion with Discriminative Training
    Hwang, Hsin-Te
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3061 - 3065
  • [6] GMM-Based Speaker Gender and Age Classification After Voice Conversion
    Pribil, Jiri
    Pribilova, Anna
    Matousek, Jindrich
    [J]. 2016 FIRST INTERNATIONAL WORKSHOP ON SENSING, PROCESSING AND LEARNING FOR INTELLIGENT MACHINES (SPLINE), 2016,
  • [7] Enhancing a Glossectomy Patient's Speech via GMM-based Voice Conversion
    Tanaka, Kei
    Hara, Sunao
    Abe, Masanobu
    Minagi, Shogo
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] Voice Conversion Using Bilinear Model Integrated with Joint GMM-based Classification
    Sun, Xinjian
    Zhang, Xiongwei
    Yang, Jibin
    Cao, Tieyong
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2013, : 1225 - 1228
  • [9] Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
    Nakamura, Keigo
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. SPEECH COMMUNICATION, 2012, 54 (01) : 134 - 146
  • [10] Voice Conversion Based on Improved GMM and Spectrum with Synchronous Prosody
    Zhang Bing
    Yu Yibiao
    [J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 659 - 662