AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS

被引:0
|
作者
Wang, Hao [1 ]
Soong, Frank [1 ,2 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Speech Grp, Beijing, Peoples R China
关键词
cross-lingual; voice transformation; spectral space warping; HMM-based TTS; ALGORITHMS; ASSIGNMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new approach to cross-lingual voice transformation in HMM-based TTS with only the recordings from two monolingual speakers in different languages (e.g. Mandarin and English). We aim to synthesize one speaker's speech in the other language. We regard the spectral space of any speaker to be composed of universal elementary units (i.e. tied-states) of speech in different languages. Our approach first forces the spectral spaces of the two speakers to have the same number of tied-states. Then we find an optimal one-to-one tied-state mapping between the two spectral spaces. Hence, the mapped speech trajectory in the spectral space of the target speaker can be found according to that generated in the spectral space of the reference speaker. Consequently, we can synthesize high-quality speech for the target monolingual speaker's voice in the other language. This can also be used as training data for a new TTS system.
引用
收藏
页码:4874 / 4878
页数:5
相关论文
共 50 条
  • [21] Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
    Oura, Keiichiro
    Yamagishi, Junichi
    Wester, Mirjam
    King, Simon
    Tokuda, Keiichi
    [J]. SPEECH COMMUNICATION, 2012, 54 (06) : 703 - 714
  • [22] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION
    Gibson, Matthew
    Hirsimaki, Teemu
    Karhila, Reima
    Kurimo, Mikko
    Byrne, William
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4642 - 4645
  • [23] Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices
    Peng, Xianglin
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 605 - 608
  • [24] CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS
    Xie, Feng-Long
    Wu, Yi-Jian
    Soong, Frank K.
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 60 - 63
  • [25] Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction
    Gibson, Matthew
    Byrne, William
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 895 - 904
  • [26] An HMM-Based Approach for Cross-Harmonization of Jazz Standards
    Kaliakatsos-Papakostas, Maximos
    Velenis, Konstantinos
    Pasias, Leandros
    Alexandraki, Chrisoula
    Cambouropoulos, Emilios
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [27] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
    Qian, Yao
    Soong, Frank
    Wang, Miaomiao
    Wu, Zhizheng
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
  • [28] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    M. Kiran Reddy
    K. Sreenivasa Rao
    [J]. Neural Processing Letters, 2020, 51 : 2029 - 2042
  • [29] DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features
    Reddy, M. Kiran
    Rao, K. Sreenivasa
    [J]. NEURAL PROCESSING LETTERS, 2020, 51 (02) : 2029 - 2042
  • [30] XPTA: an AMR parser for Portuguese based on cross-lingual approach
    Seno, Eloize Rossi Marques
    Caseli, Helena de Medeiros
    Inacio, Marcio Lima
    Anchieta, Rafael Torres
    Ramisch, Renata
    [J]. LINGUAMATICA, 2022, 14 (01): : 49 - 68