Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker

被引:1
|
作者
Chung, Raymond [1 ,2 ]
Mak, Brian [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[2] Logist & Supply Chain MultiTech R&D Ctr, Pok Fu Lam, Hong Kong, Peoples R China
来源
关键词
text-to-speech; neural speech synthesis; accent conversion; FOREIGN ACCENT;
D O I
10.21437/Interspeech.2022-11124
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates how to reduce foreign accent in the synthesis of native (L1) speech for a non-native (L2) speaker. We focus on two major aspects of foreign accents: mispronunciations and improper prosody (rhythm, phonemes duration, and pauses). Firstly, to reduce mispronunciations, the mel-spectrograms generated by an L2 text-to-speech (TTS) model are fed to a pre-trained speech recognizer and the mispronunciation information is fed back to the TTS model during back-propagation to help the model learn correct native mel-spectrograms. Secondly, to imitate L1 speech prosody, a recent data augmentation (DA) technique originally proposed for speaking style transfer is applied to transfer L1 speaking style to L2 speakers. The DA technique creates additional L2 speeches when L2 speakers try to imitate L1 speeches. Automatic speech recognition on native-accented speeches synthesized from non-native speakers by the proposed method gives a lower word error rate. The speaker embeddings produced by a pre-trained speaker verifier from the original L2 speakers' speech and their synthesized speech are highly similar. Finally, subjective MOS scores on the synthesized speech show that they have good quality and reduced accentedness.
引用
收藏
页码:4302 / 4306
页数:5
相关论文
共 50 条
  • [31] Native speaker dichotomy: Stakeholders' preferences and perceptions of native and non-native speaking English language teachers
    Atamturk, Nurdan
    Atamturk, Hakan
    Dimililer, Celen
    [J]. SOUTH AFRICAN JOURNAL OF EDUCATION, 2018, 38 (01)
  • [32] Being a non-native English speaker in science and medicine
    Carlsson, Sigrid V.
    Esteves, Sandro C.
    Grobet-Jeandin, Elisabeth
    Masone, Maria Chiara
    Ribal, Maria J.
    Zhu, Yao
    [J]. NATURE REVIEWS UROLOGY, 2024, 21 (03) : 127 - 132
  • [33] The non-native English speaker teachers in TESOL movement
    Kamhi-Stein, Lia D.
    [J]. ELT JOURNAL, 2016, 70 (02) : 180 - 189
  • [34] Being a non-native English speaker in science and medicine
    Sigrid V. Carlsson
    Sandro C. Esteves
    Elisabeth Grobet-Jeandin
    Maria Chiara Masone
    Maria J. Ribal
    Yao Zhu
    [J]. Nature Reviews Urology, 2024, 21 : 127 - 132
  • [35] The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners
    Marcoux, Katherine
    Cooke, Martin
    Tucker, Benjamin, V
    Ernestus, Mirjam
    [J]. SPEECH COMMUNICATION, 2022, 136 : 53 - 62
  • [36] Non-native Speaker Identity Verification Based on ICA
    Wei Hong
    Yang Jian
    Pu Yuanyuan
    [J]. PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2725 - 2728
  • [37] THE POSITION OF A NON-NATIVE SPEAKER IN FOREIGN LANGUAGE TEACHING
    Grujic, Tatjana S.
    Danilovic, Jelena R.
    [J]. NASLEDE, 2015, 12 (30): : 163 - 176
  • [38] A Pragmatic Assessment of Non-native Speaker Discourse: Interruptions
    Anderson, Jodee
    [J]. MOENIA-REVISTA LUCENSE DE LINGUISTICA & LITERATURA, 2005, 11 : 359 - 367
  • [39] THE NON-NATIVE SPEAKER OF ENGLISH LEARNS TO WRITE - SOMEHOW
    MATTHIES, B
    [J]. CANADIAN MODERN LANGUAGE REVIEW-REVUE CANADIENNE DES LANGUES VIVANTES, 1980, 36 (04): : 713 - 723
  • [40] Using stimulated recall to investigate native speaker perceptions in native-non native speaker interaction
    Polio, Charlene
    Gass, Susan
    Chapin, Laura
    [J]. STUDIES IN SECOND LANGUAGE ACQUISITION, 2006, 28 (02) : 237 - 267