CROSS-LINGUAL FRAME SELECTION METHOD FOR POLYGLOT SPEECH SYNTHESIS

被引：0

作者：

Chen, Chia-Ping ^{[1
]}

Huang, Yi-Chin ^{[2
]}

Wu, Chung-Hsien ^{[2
]}

Lee, Kuan-De ^{[2
]}

机构：

[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 80424, Taiwan

[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan

来源：

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年

关键词：

polyglot speech synthesis; frame selection; articulatory features; auditory features;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.

引用

页码：4521 / 4524

页数：4

共 50 条

[41] XTREME-S: Evaluating Cross-lingual Speech Representations
Conneau, Alexis
Bapna, Ankur
Zhang, Yu
Ma, Min
von Platen, Patrick
Lozhkov, Anton
Cherry, Colin
Jia, Ye
Rivera, Clara
Kale, Mihir
Van Esch, Daan
Axelrod, Vera
Khanuja, Simran
Clark, Jonathan H.
Firat, Orhan
Auli, Michael
Ruder, Sebastian
Riesa, Jason
Johnson, Melvin
[J]. INTERSPEECH 2022, 2022, : 3248 - 3252
[42] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
Elmahdy, Mohamed
Gruhn, Rainer
Minker, Wolfgang
Abdennadher, Slim
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
[43] Improving hate speech detection using Cross-Lingual Learning
Firmino, Anderson Almeida
Baptista, Claudio de Souza
de Paiva, Anselmo Cardoso
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
[44] Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
Wester, Mirjam
Liang, Hui
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2492 - 2495
[45] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
Yang, Hongwu
Oura, Keiichiro
Wang, Haiyan
Gan, Zhenye
Tokuda, Keiichi
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9927 - 9942
[46] A FRAME MAPPING BASED HMM APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION
Qian, Yao
Xu, Ji
Soong, Frank K.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5120 - 5123
[47] Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
Zhan, Haoyue
Zhang, Haitong
Ou, Wenjie
Lin, Yue
[J]. INTERSPEECH 2021, 2021, : 1599 - 1603
[48] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
Yu, Quanjie
Liu, Peng
Wu, Zhiyong
Kang, Shiyin
Meng, Helen
Cai, Lianhong
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
[49] Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
Wu, Hanqian
Wang, Zhike
Qing, Feng
Li, Shoushan
[J]. ELECTRONICS, 2021, 10 (03) : 1 - 14
[50] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
Hongwu Yang
Keiichiro Oura
Haiyan Wang
Zhenye Gan
Keiichi Tokuda
[J]. Multimedia Tools and Applications, 2015, 74 : 9927 - 9942

← 1 2 3 4 5 →