CROSS-LINGUAL FRAME SELECTION METHOD FOR POLYGLOT SPEECH SYNTHESIS

被引:0
|
作者
Chen, Chia-Ping [1 ]
Huang, Yi-Chin [2 ]
Wu, Chung-Hsien [2 ]
Lee, Kuan-De [2 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Comp Sci & Engn, Kaohsiung 80424, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
polyglot speech synthesis; frame selection; articulatory features; auditory features;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel approach is proposed to creating a polyglot speech synthesis system without the need of collecting speech data from a bilingual (or multilingual) speaker, which is often expensive or even infeasible. Given a target speaker with data in the first language (Mandarin in this study), the basic idea is to construct artificial utterances in the second language (English) via selection of speech sample frames of the given speaker in the first language. As the speaker needs not be polyglot, this method is generally applicable to any speaker and any languages. In the search for optimal frame sequence selection, the candidate set is constrained by a decision tree for phone segments in the speech data of both languages, and the cost function depends on the context-dependent articulatory and auditory features. Evaluation results show that good performance regarding similarity (speaker identity) and naturalness (speech quality) can be achieved with the proposed method.
引用
收藏
页码:4521 / 4524
页数:4
相关论文
共 50 条
  • [1] Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
    Chen, Chia-Ping
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Lee, Kuan-De
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1558 - 1570
  • [2] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
  • [3] Frame Alignment Method for Cross-lingual Voice Conversion
    Erro, Daniel
    Moreno, Asuncion
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1533 - 1536
  • [4] IMPROVING CROSS-LINGUAL SPEECH SYNTHESIS WITH TRIPLET TRAINING SCHEME
    Ye, Jianhao
    Zhou, Hongbin
    Su, Zhiba
    He, Wendi
    Ren, Kaimeng
    Li, Lin
    Lu, Heng
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6072 - 6076
  • [5] Cross-lingual Dialog Model for Speech to Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1173 - 1176
  • [6] Model Selection for Cross-Lingual Transfer
    Chen, Yang
    Ritter, Alan
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5675 - 5687
  • [7] Exploring Cross-lingual Singing Voice Synthesis Using Speech Data
    Cao, Yuewen
    Liu, Songxiang
    Kang, Shiyin
    Hu, Na
    Liu, Peng
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [8] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [9] Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
    Zhang, Weizhao
    Yang, Hongwu
    Bu, Xiaolong
    Wang, Lili
    [J]. IEEE ACCESS, 2019, 7 : 167884 - 167894
  • [10] Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings
    Nosek, Tijana, V
    Suzic, Sinisa B.
    Pekar, Darko J.
    Obradovic, Radovan J.
    Secujski, Milan S.
    Delic, Vlado D.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 110 - 120