Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker

被引：1

作者：

Chung, Raymond ^{[1
,2
]}

Mak, Brian ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China

[2] Logist & Supply Chain MultiTech R&D Ctr, Pok Fu Lam, Hong Kong, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

text-to-speech; neural speech synthesis; accent conversion; FOREIGN ACCENT;

D O I：

10.21437/Interspeech.2022-11124

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates how to reduce foreign accent in the synthesis of native (L1) speech for a non-native (L2) speaker. We focus on two major aspects of foreign accents: mispronunciations and improper prosody (rhythm, phonemes duration, and pauses). Firstly, to reduce mispronunciations, the mel-spectrograms generated by an L2 text-to-speech (TTS) model are fed to a pre-trained speech recognizer and the mispronunciation information is fed back to the TTS model during back-propagation to help the model learn correct native mel-spectrograms. Secondly, to imitate L1 speech prosody, a recent data augmentation (DA) technique originally proposed for speaking style transfer is applied to transfer L1 speaking style to L2 speakers. The DA technique creates additional L2 speeches when L2 speakers try to imitate L1 speeches. Automatic speech recognition on native-accented speeches synthesized from non-native speakers by the proposed method gives a lower word error rate. The speaker embeddings produced by a pre-trained speaker verifier from the original L2 speakers' speech and their synthesized speech are highly similar. Finally, subjective MOS scores on the synthesized speech show that they have good quality and reduced accentedness.

引用

页码：4302 / 4306

页数：5

共 50 条

[31] Native speaker dichotomy: Stakeholders' preferences and perceptions of native and non-native speaking English language teachers
Atamturk, Nurdan
Atamturk, Hakan
Dimililer, Celen
[J]. SOUTH AFRICAN JOURNAL OF EDUCATION, 2018, 38 (01)
[32] Being a non-native English speaker in science and medicine
Carlsson, Sigrid V.
Esteves, Sandro C.
Grobet-Jeandin, Elisabeth
Masone, Maria Chiara
Ribal, Maria J.
Zhu, Yao
[J]. NATURE REVIEWS UROLOGY, 2024, 21 (03) : 127 - 132
[33] The non-native English speaker teachers in TESOL movement
Kamhi-Stein, Lia D.
[J]. ELT JOURNAL, 2016, 70 (02) : 180 - 189
[34] Being a non-native English speaker in science and medicine
Sigrid V. Carlsson
Sandro C. Esteves
Elisabeth Grobet-Jeandin
Maria Chiara Masone
Maria J. Ribal
Yao Zhu
[J]. Nature Reviews Urology, 2024, 21 : 127 - 132
[35] The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners
Marcoux, Katherine
Cooke, Martin
Tucker, Benjamin, V
Ernestus, Mirjam
[J]. SPEECH COMMUNICATION, 2022, 136 : 53 - 62
[36] Non-native Speaker Identity Verification Based on ICA
Wei Hong
Yang Jian
Pu Yuanyuan
[J]. PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2725 - 2728
[37] THE POSITION OF A NON-NATIVE SPEAKER IN FOREIGN LANGUAGE TEACHING
Grujic, Tatjana S.
Danilovic, Jelena R.
[J]. NASLEDE, 2015, 12 (30): : 163 - 176
[38] A Pragmatic Assessment of Non-native Speaker Discourse: Interruptions
Anderson, Jodee
[J]. MOENIA-REVISTA LUCENSE DE LINGUISTICA & LITERATURA, 2005, 11 : 359 - 367
[39] THE NON-NATIVE SPEAKER OF ENGLISH LEARNS TO WRITE - SOMEHOW
MATTHIES, B
[J]. CANADIAN MODERN LANGUAGE REVIEW-REVUE CANADIENNE DES LANGUES VIVANTES, 1980, 36 (04): : 713 - 723
[40] Using stimulated recall to investigate native speaker perceptions in native-non native speaker interaction
Polio, Charlene
Gass, Susan
Chapin, Laura
[J]. STUDIES IN SECOND LANGUAGE ACQUISITION, 2006, 28 (02) : 237 - 267

← 1 2 3 4 5 →