AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS

被引:0
|
作者
Wang, Hao [1 ]
Soong, Frank [1 ,2 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Speech Grp, Beijing, Peoples R China
关键词
cross-lingual; voice transformation; spectral space warping; HMM-based TTS; ALGORITHMS; ASSIGNMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new approach to cross-lingual voice transformation in HMM-based TTS with only the recordings from two monolingual speakers in different languages (e.g. Mandarin and English). We aim to synthesize one speaker's speech in the other language. We regard the spectral space of any speaker to be composed of universal elementary units (i.e. tied-states) of speech in different languages. Our approach first forces the spectral spaces of the two speakers to have the same number of tied-states. Then we find an optimal one-to-one tied-state mapping between the two spectral spaces. Hence, the mapped speech trajectory in the spectral space of the target speaker can be found according to that generated in the spectral space of the reference speaker. Consequently, we can synthesize high-quality speech for the target monolingual speaker's voice in the other language. This can also be used as training data for a new TTS system.
引用
收藏
页码:4874 / 4878
页数:5
相关论文
共 50 条
  • [31] A parametric linguistics based approach for cross-lingual web querying
    Kapetanios, Epaminondas
    Sugumaran, Vijayan
    Tanase, Diana
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 66 (01) : 35 - 52
  • [32] An Analysis of Language Mismatch in HMM State Mapping-Based Cross-Lingual Speaker Adaptation
    Liang, Hui
    Dines, John
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 622 - 625
  • [33] Using Spectral Fluctuation of Speech in multi-feature HMM-based voice activity detection
    Espi, Miquel
    Miyabe, Shigeki
    Nishimoto, Takuya
    Ono, Nobutaka
    Sagayama, Shigeki
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2624 - 2627
  • [34] Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
    Yamamoto, Ryuichi
    Shirahata, Yuma
    Kawamura, Masaya
    Tachibana, Kentaro
    [J]. arXiv,
  • [35] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
  • [36] A word embedding-based approach to cross-lingual topic modeling
    Chia-Hsuan Chang
    San-Yih Hwang
    [J]. Knowledge and Information Systems, 2021, 63 : 1529 - 1555
  • [37] A word embedding-based approach to cross-lingual topic modeling
    Chang, Chia-Hsuan
    Hwang, San-Yih
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) : 1529 - 1555
  • [38] Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion
    Hsu, Pin-Chieh
    Minematsu, Nobuaki
    Saito, Daisuke
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 950 - 956
  • [39] Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network
    Gan, Zhenye
    Xing, Xiaotian
    Yang, Hongwu
    Zhao, Guangying
    [J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 67 - 71
  • [40] A FULL TRAINING FRAMEWORK OF CROSS-STREAM DEPENDENCE MODELLING FOR HMM-BASED SINGING VOICE SYNTHESIS
    Wang, Xin
    Dong, Minghui
    Ling, Zhen-Hua
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5165 - 5169