Supervisory Data Alignment for Text-Independent Voice Conversion

被引:18
|
作者
Tao, Jianhua [1 ]
Zhang, Meng [1 ]
Nurminen, Jani [2 ]
Tian, Jilei [3 ]
Wang, Xia [3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Nokia Devices R&D, Tampere 33720, Finland
[3] Nokia Res Ctr, Beijing 100176, Peoples R China
基金
中国国家自然科学基金;
关键词
Data alignment; self-organized learning; supervisory phonetic restriction; text-independent voice conversion; TRANSFORMATION; CONVERGENCE; ALGORITHM;
D O I
10.1109/TASL.2010.2041688
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose new supervisory data alignment methods for text-independent voice conversion which do not need parallel training corpora. Phonetic information is used as a restriction during alignment for mapping the data from the source speaker onto the parameter space of a target speaker. Both linear and nonlinear methods are derived by considering alignment accuracy and topology preservation. For the linear alignment, we consider common phoneme clusters of the source and target space as benchmarks and adapt the source data vector to the target space while maintaining the relative phonetic positions among neighborhood clusters. In order to preserve the topological structure of the source parameter space and improve the stability of conversion and the accuracy of the phonetic mapping, a supervised self-organizing learning algorithm considering phonetic restriction is proposed for iteratively improving the alignment outcome of the previous step. Both the linear and nonlinear methods can also be applied in the cross-lingual case. Evaluation results show that the proposed methods improve the performance of alignment in terms of both alignment accuracy and stability for text-independent voice conversion in intra-lingual and cross-lingual cases.
引用
收藏
页码:932 / 943
页数:12
相关论文
共 50 条
  • [1] Text-Independent Voice Conversion Based on Kernel Eigenvoice
    Li, Yanping
    Zhang, Linghua
    Ding, Hui
    [J]. ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 432 - +
  • [2] Text-Independent Cross-Language Voice Conversion
    Suendermann, David
    Hoege, Harald
    Bonafonte, Antonio
    Ney, Hermann
    Hirschberg, Julia
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2262 - +
  • [3] Text-independent voice conversion based on unit selection
    Suendermann, David
    Hoege, Harald
    Bonafonte, Antonio
    Ney, Hermann
    Black, Alan
    Narayanan, Shri
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 81 - 84
  • [4] Text-independent voice conversion based on state mapped codebook
    Zhang, Meng
    Tao, Jianhua
    Tian, Jilei
    Wang, Xia
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4605 - +
  • [5] Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
    Wu, Zhi-Zheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1732 - +
  • [6] PHONEME CLUSTER BASED STATE MAPPING FOR TEXT-INDEPENDENT VOICE CONVERSION
    Zhang, Meng
    Tao, Jiaohua
    Nurminen, Jani
    Tian, Jilei
    Wang, Xia
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4281 - +
  • [7] Voice text-independent system for speaker identification
    Babenko, LK
    Makarevich, OB
    Fedorov, VM
    Yurkov, PY
    [J]. IZVESTIYA VYSSHIKH UCHEBNYKH ZAVEDENII RADIOELEKTRONIKA, 2004, 47 (3-4): : 66 - 70
  • [8] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
    Zheng, Huadi
    Cai, Weicheng
    Zhou, Tianyan
    Zhang, Shilei
    Li, Ming
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
  • [9] A Text-Independent Forced Alignment Method for Automatic Phoneme Segmentation
    Wohlan, Bryce
    Pham, Duc-Son
    Chan, Kit Yan
    Ward, Roslyn
    [J]. AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 585 - 598
  • [10] Significance of Constraining Text in Limited Data Text-independent Speaker Verification
    Das, Rohan Kumar
    Jelil, Sarfaraz
    Prasanna, S. R. Mahadeva
    [J]. 2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,