Low-dimensional Style Token Control for Hyperarticulated Speech Synthesis

被引：1

作者：

Nishihara, Miku ^{[1
]}

Wells, Dan ^{[2
]}

Richmond, Korin ^{[2
]}

Pine, Aidan ^{[3
]}

机构：

[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi, Japan

[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

[3] Natl Res Council Canada, Ottawa, ON, Canada

来源：

INTERSPEECH 2024 | 2024年

关键词：

controllable speech synthesis; speech style embedding; hyperarticulated speech; SPEAKING; HEARING;

D O I：

10.21437/Interspeech.2024-2074

中图分类号：

学科分类号：

摘要：

Global style tokens (GSTs) allow for rich modelling of the variation in a speech corpus and subsequent control of text-to-speech synthesis (TTS). However, certain styles of speech may be marked by variation along multiple dimensions, complicating the interpretation and control of learned style tokens. One example is hyperarticulated or 'clear' speech, for example as directed toward listeners with hearing impairments or language learners in the classroom, which in English is characterised by reduced speaking rate, increased F0, more careful articulation of vowels and plosive consonants, and other factors. We present a method for simplifying control of style tokens by applying principal components analysis (PCA) to GST weights from a TTS system trained on both plain and clear speech. We identify the axes of variation in PCA space with the acoustic correlates of clear speech in English and show that we can synthesise either style by moving along a single dimension in that space. Index Terms: controllable speech synthesis, speech style

引用

页码：3385 / 3389

页数：5

共 50 条

[1] Shape control synthesis of low-dimensional calcium sulfate
LI-XIA YANG
YAN-FENG MENG
PING YIN
YING-XIA YANG
YING-YING TANG
LAI-FEN QIN
Bulletin of Materials Science, 2011, 34 : 233 - 237
[2] Shape control synthesis of low-dimensional calcium sulfate
Yang, Li-Xia
Meng, Yan-Feng
Yin, Ping
Yang, Ying-Xia
Tang, Ying-Ying
Qin, Lai-Fen
BULLETIN OF MATERIALS SCIENCE, 2011, 34 (02) : 233 - 237
[3] Defects control in the synthesis of low-dimensional zinc oxide
Takaki, Hidetaka
Inoue, Shuhei
Matsumura, Yukihiko
CHEMICAL PHYSICS LETTERS, 2017, 684 : 113 - 116
[4] An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis
Kwon, Ohsung
Jang, Inseon
Ahn, ChungHyun
Kang, Hong-Goo
IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (09) : 1383 - 1387
[5] Low-dimensional Space Transforms of Posteriors in Speech Recognition
Zelinka, Jan
Trmal, Jan
Mueller, Ludek
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1193 - 1196
[6] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
[7] Synthesis of low-dimensional CuI nanomaterials
Zhou, HC
Xu, S
Li, YD
CHINESE JOURNAL OF INORGANIC CHEMISTRY, 2003, 19 (06) : 621 - 623
[8] Topochemical synthesis of low-dimensional nanomaterials
Zhang, Qicheng
Peng, Wenchao
Li, Yang
Zhang, Fengbao
Fan, Xiaobin
NANOSCALE, 2020, 12 (43) : 21971 - 21987
[9] Low-dimensional control: Tonus (1963)
Meijer, OG
Kots, YM
Edgerton, VR
MOTOR CONTROL, 2001, 5 (01) : 1 - +
[10] Low-dimensional representation of spectral envelope using deep auto-encoder for speech synthesis
Choi, Heejin
Kim, Jaeseok
Park, Jinuk
Kim, Juntae
Hahn, Minsoo
ICMSCE 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON MECHATRONICS SYSTEMS AND CONTROL ENGINEERING, 2015, : 107 - 111

← 1 2 3 4 5 →