AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS

被引:0
|
作者
Wang, Hao [1 ]
Soong, Frank [1 ,2 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Speech Grp, Beijing, Peoples R China
关键词
cross-lingual; voice transformation; spectral space warping; HMM-based TTS; ALGORITHMS; ASSIGNMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new approach to cross-lingual voice transformation in HMM-based TTS with only the recordings from two monolingual speakers in different languages (e.g. Mandarin and English). We aim to synthesize one speaker's speech in the other language. We regard the spectral space of any speaker to be composed of universal elementary units (i.e. tied-states) of speech in different languages. Our approach first forces the spectral spaces of the two speakers to have the same number of tied-states. Then we find an optimal one-to-one tied-state mapping between the two spectral spaces. Hence, the mapped speech trajectory in the spectral space of the target speaker can be found according to that generated in the spectral space of the reference speaker. Consequently, we can synthesize high-quality speech for the target monolingual speaker's voice in the other language. This can also be used as training data for a new TTS system.
引用
收藏
页码:4874 / 4878
页数:5
相关论文
共 50 条
  • [1] A FRAME MAPPING BASED HMM APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION
    Qian, Yao
    Xu, Ji
    Soong, Frank K.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5120 - 5123
  • [2] A UNIFIED TRAJECTORY TILING APPROACH TO HIGH QUALITY TTS AND CROSS-LINGUAL VOICE TRANSFORMATION
    Qian, Yao
    Soong, Frank K.
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 165 - 169
  • [3] A cross-lingual approach to the development of an HMM-based speech synthesis system for Malay
    Mustafa, Mumtaz B.
    Ainon, Raja N.
    Zainuddin, Roziati
    Don, Zuraidah M.
    Knowles, Gerry
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3204 - 3207
  • [4] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [5] A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks
    Percybrooks, Winston S.
    Moore, Elliot
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2298 - 2310
  • [6] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
  • [7] An Approach to Cross-Lingual Voice Conversion
    Rallabandi, Sai Sirisha
    Gangashetty, Suryakanth V.
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
    Wu, Yi-Jian
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 516 - 519
  • [9] A KL DIVERGENCE AND DNN APPROACH TO CROSS-LINGUAL TTS
    Xie, Feng-Long
    Soong, Frank K.
    Li, Haifeng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5515 - 5519
  • [10] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS
    Liang, Hui
    Dines, John
    Saheer, Lakshmi
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4598 - 4601