Bilingual Voice Conversion by Weighted Frequency Warping Based on Formant Space

被引:0
|
作者
Yun, Young-Sun [1 ]
Ladner, Richard E. [2 ]
机构
[1] Hannam Univ, Dept Informat & Commun Engn, Taejon 306791, South Korea
[2] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
关键词
voice conversion; weighted frequency warping; formant space; NORMALIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice conversion is a technique that transforms the source speaker's individuality to that of the target speaker. In this paper, we propose a simple and intuitive voice conversion algorithm that does not use training data between different languages, but uses text-to-speech generated speech rather than real recorded voices. The suggested method finds the transformed frequency by formant space warping. The formant space comprises four representative monophthongs for each language. The warping functions are represented by piecewise linear equations using pairs of four formants at matched monophthongs. Experimental results show the potential of the proposed method.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 50 条
  • [31] Fast frequency-weighted formant tracking using analysis by synthesis method
    Zhang, JP
    Yan, YH
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (04) : 682 - 686
  • [32] Speaker normalization based on frequency warping
    Zhan, PM
    Westphal, M
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1039 - 1042
  • [33] Pitch mean based frequency warping
    Liu, Jian
    Zheng, Thomas Fang
    Wu, Wenhu
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 87 - +
  • [34] Formant position based weighted spectral features for emotion recognition
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    [J]. SPEECH COMMUNICATION, 2011, 53 (9-10) : 1186 - 1197
  • [35] One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space
    Saito, Daisuke
    Yamamoto, Keisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 660 - 663
  • [36] Voice conversion based on state-space model for modelling spectral trajectory
    Xu, N.
    Yang, Z.
    Zhang, L. H.
    Zhu, W. P.
    Bao, J. Y.
    [J]. ELECTRONICS LETTERS, 2009, 45 (14) : 763 - U73
  • [37] The interaction of formant frequency and pitch in the perception of voice category and jaw opening in female singers
    Erickson, ML
    [J]. JOURNAL OF VOICE, 2004, 18 (01) : 24 - 37
  • [38] Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender
    Skuk, Verena G.
    Schweinberger, Stefan R.
    [J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2014, 57 (01): : 285 - 296
  • [39] AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS
    Wang, Hao
    Soong, Frank
    Meng, Helen
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4874 - 4878
  • [40] Novel Inter Mixture Weighted GMM Posteriorgram for DNN and GAN-based Voice Conversion
    Shah, Nirmesh J.
    Sreeraj, R.
    Shah, Neil
    Patil, Hemant A.
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1776 - 1781