CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY

被引:0
|
作者
Henter, Gustav Eje [1 ]
Lorenzo-Trueba, Jaime [1 ]
Wang, Xin [1 ]
Kondo, Mariko [2 ]
Yamagishi, Junichi [1 ,3 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] Waseda Univ, Tokyo, Japan
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
Multilingual speech synthesis; phonetic manipulation; foreign accent; DNN; RECURRENT NEURAL-NETWORK; ENGLISH; INTELLIGIBILITY;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm "cyborg speech" as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quin-phone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.
引用
收藏
页码:4799 / 4803
页数:5
相关论文
共 50 条
  • [1] Generating segmental foreign accent
    Garcia Lecumberri, Maria Luisa
    Barra-Chicote, Roberto
    Perez Ramon, Ruben
    Yamagishi, Junichi
    Cooke, Martin
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1302 - 1306
  • [2] Perception of foreign accent syndrome speech and its relation to segmental characteristics
    Dankovicova, Jana
    Hunt, Claire
    CLINICAL LINGUISTICS & PHONETICS, 2011, 25 (02) : 85 - 120
  • [3] FOREIGN ACCENT AND SPEECH DISTORTION
    LANE, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (12): : 1996 - &
  • [4] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
  • [5] FOREIGN ACCENT AND SPEECH DISTORTION
    LANE, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1963, 35 (04): : 451 - &
  • [6] Towards a multilingual prosody model for text-to-speech
    Jokisch, O
    Ding, HW
    Kruschke, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 421 - 424
  • [7] Foreign accent syndrome, speech rhythm and the functional neuronatomy of speech production
    Scott, Sophie K.
    Clegg, Frances
    Rudge, Peter
    Burgess, Paul
    JOURNAL OF NEUROLINGUISTICS, 2006, 19 (05) : 370 - 384
  • [8] Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis
    Peng, Yukun
    Ling, Zhenhua
    INTERSPEECH 2022, 2022, : 4257 - 4261
  • [9] Converting Foreign Accent Speech Without a Reference
    Zhao, Guanlong
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2367 - 2381
  • [10] Phases in speech encoding and foreign accent syndrome
    Varley, Rosemary
    Whiteside, Sandra
    Hammill, Claire
    Cooper, Katherine
    JOURNAL OF NEUROLINGUISTICS, 2006, 19 (05) : 356 - 369