CROSS-LINGUAL FRAME SELECTION METHOD FOR POLYGLOT SPEECH SYNTHESIS

被引:0
|
作者
Chen, Chia-Ping [1 ]
Huang, Yi-Chin [2 ]
Wu, Chung-Hsien [2 ]
Lee, Kuan-De [2 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 80424, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
polyglot speech synthesis; frame selection; articulatory features; auditory features;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.
引用
收藏
页码:4521 / 4524
页数:4
相关论文
共 50 条
  • [41] XTREME-S: Evaluating Cross-lingual Speech Representations
    Conneau, Alexis
    Bapna, Ankur
    Zhang, Yu
    Ma, Min
    von Platen, Patrick
    Lozhkov, Anton
    Cherry, Colin
    Jia, Ye
    Rivera, Clara
    Kale, Mihir
    Van Esch, Daan
    Axelrod, Vera
    Khanuja, Simran
    Clark, Jonathan H.
    Firat, Orhan
    Auli, Michael
    Ruder, Sebastian
    Riesa, Jason
    Johnson, Melvin
    [J]. INTERSPEECH 2022, 2022, : 3248 - 3252
  • [42] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
  • [43] Improving hate speech detection using Cross-Lingual Learning
    Firmino, Anderson Almeida
    Baptista, Claudio de Souza
    de Paiva, Anselmo Cardoso
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [44] Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
    Wester, Mirjam
    Liang, Hui
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2492 - 2495
  • [45] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Yang, Hongwu
    Oura, Keiichiro
    Wang, Haiyan
    Gan, Zhenye
    Tokuda, Keiichi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9927 - 9942
  • [46] A FRAME MAPPING BASED HMM APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION
    Qian, Yao
    Xu, Ji
    Soong, Frank K.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5120 - 5123
  • [47] Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
    Zhan, Haoyue
    Zhang, Haitong
    Ou, Wenjie
    Lin, Yue
    [J]. INTERSPEECH 2021, 2021, : 1599 - 1603
  • [48] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [49] Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification
    Wu, Hanqian
    Wang, Zhike
    Qing, Feng
    Li, Shoushan
    [J]. ELECTRONICS, 2021, 10 (03) : 1 - 14
  • [50] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Hongwu Yang
    Keiichiro Oura
    Haiyan Wang
    Zhenye Gan
    Keiichi Tokuda
    [J]. Multimedia Tools and Applications, 2015, 74 : 9927 - 9942