AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS

被引：0

作者：

Wang, Hao ^{[1
]}

Soong, Frank ^{[1
,2
]}

Meng, Helen ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China

[2] Microsoft Res Asia, Speech Grp, Beijing, Peoples R China

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

cross-lingual; voice transformation; spectral space warping; HMM-based TTS; ALGORITHMS; ASSIGNMENT;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new approach to cross-lingual voice transformation in HMM-based TTS with only the recordings from two monolingual speakers in different languages (e.g. Mandarin and English). We aim to synthesize one speaker's speech in the other language. We regard the spectral space of any speaker to be composed of universal elementary units (i.e. tied-states) of speech in different languages. Our approach first forces the spectral spaces of the two speakers to have the same number of tied-states. Then we find an optimal one-to-one tied-state mapping between the two spectral spaces. Hence, the mapped speech trajectory in the spectral space of the target speaker can be found according to that generated in the spectral space of the reference speaker. Consequently, we can synthesize high-quality speech for the target monolingual speaker's voice in the other language. This can also be used as training data for a new TTS system.

引用

页码：4874 / 4878

页数：5

共 50 条

[31] A parametric linguistics based approach for cross-lingual web querying
Kapetanios, Epaminondas
Sugumaran, Vijayan
Tanase, Diana
[J]. DATA & KNOWLEDGE ENGINEERING, 2008, 66 (01) : 35 - 52
[32] An Analysis of Language Mismatch in HMM State Mapping-Based Cross-Lingual Speaker Adaptation
Liang, Hui
Dines, John
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 622 - 625
[33] Using Spectral Fluctuation of Speech in multi-feature HMM-based voice activity detection
Espi, Miquel
Miyabe, Shigeki
Nishimoto, Takuya
Ono, Nobutaka
Sagayama, Shigeki
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2624 - 2627
[34] Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Yamamoto, Ryuichi
Shirahata, Yuma
Kawamura, Masaya
Tachibana, Kentaro
[J]. arXiv,
[35] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
Ramani, B.
Jeeva, Actlin M. P.
Vijayalakshmi, P.
Nagarajan, T.
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
[36] A word embedding-based approach to cross-lingual topic modeling
Chia-Hsuan Chang
San-Yih Hwang
[J]. Knowledge and Information Systems, 2021, 63 : 1529 - 1555
[37] A word embedding-based approach to cross-lingual topic modeling
Chang, Chia-Hsuan
Hwang, San-Yih
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) : 1529 - 1555
[38] Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion
Hsu, Pin-Chieh
Minematsu, Nobuaki
Saito, Daisuke
[J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 950 - 956
[39] Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network
Gan, Zhenye
Xing, Xiaotian
Yang, Hongwu
Zhao, Guangying
[J]. PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 67 - 71
[40] A FULL TRAINING FRAMEWORK OF CROSS-STREAM DEPENDENCE MODELLING FOR HMM-BASED SINGING VOICE SYNTHESIS
Wang, Xin
Dong, Minghui
Ling, Zhen-Hua
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5165 - 5169

← 1 2 3 4 5 →