Bilingual Voice Conversion by Weighted Frequency Warping Based on Formant Space

被引：0

作者：

Yun, Young-Sun ^{[1
]}

Ladner, Richard E. ^{[2
]}

机构：

[1] Hannam Univ, Dept Informat & Commun Engn, Taejon 306791, South Korea

[2] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷

关键词：

voice conversion; weighted frequency warping; formant space; NORMALIZATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice conversion is a technique that transforms the source speaker's individuality to that of the target speaker. In this paper, we propose a simple and intuitive voice conversion algorithm that does not use training data between different languages, but uses text-to-speech generated speech rather than real recorded voices. The suggested method finds the transformed frequency by formant space warping. The formant space comprises four representative monophthongs for each language. The warping functions are represented by piecewise linear equations using pairs of four formants at matched monophthongs. Experimental results show the potential of the proposed method.

引用

页码：137 / 144

页数：8

共 50 条

[31] Fast frequency-weighted formant tracking using analysis by synthesis method
Zhang, JP
Yan, YH
[J]. CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (04) : 682 - 686
[32] Speaker normalization based on frequency warping
Zhan, PM
Westphal, M
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1039 - 1042
[33] Pitch mean based frequency warping
Liu, Jian
Zheng, Thomas Fang
Wu, Wenhu
[J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 87 - +
[34] Formant position based weighted spectral features for emotion recognition
Bozkurt, Elif
Erzin, Engin
Erdem, Cigdem Eroglu
Erdem, A. Tanju
[J]. SPEECH COMMUNICATION, 2011, 53 (9-10) : 1186 - 1197
[35] One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space
Saito, Daisuke
Yamamoto, Keisuke
Minematsu, Nobuaki
Hirose, Keikichi
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 660 - 663
[36] Voice conversion based on state-space model for modelling spectral trajectory
Xu, N.
Yang, Z.
Zhang, L. H.
Zhu, W. P.
Bao, J. Y.
[J]. ELECTRONICS LETTERS, 2009, 45 (14) : 763 - U73
[37] The interaction of formant frequency and pitch in the perception of voice category and jaw opening in female singers
Erickson, ML
[J]. JOURNAL OF VOICE, 2004, 18 (01) : 24 - 37
[38] Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender
Skuk, Verena G.
Schweinberger, Stefan R.
[J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2014, 57 (01): : 285 - 296
[39] AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS
Wang, Hao
Soong, Frank
Meng, Helen
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4874 - 4878
[40] Novel Inter Mixture Weighted GMM Posteriorgram for DNN and GAN-based Voice Conversion
Shah, Nirmesh J.
Sreeraj, R.
Shah, Neil
Patil, Hemant A.
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1776 - 1781

← 1 2 3 4 5 →