Low-dimensional Style Token Control for Hyperarticulated Speech Synthesis

被引:1
|
作者
Nishihara, Miku [1 ]
Wells, Dan [2 ]
Richmond, Korin [2 ]
Pine, Aidan [3 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci, Nagoya, Aichi, Japan
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[3] Natl Res Council Canada, Ottawa, ON, Canada
来源
关键词
controllable speech synthesis; speech style embedding; hyperarticulated speech; SPEAKING; HEARING;
D O I
10.21437/Interspeech.2024-2074
中图分类号
学科分类号
摘要
Global style tokens (GSTs) allow for rich modelling of the variation in a speech corpus and subsequent control of text-to-speech synthesis (TTS). However, certain styles of speech may be marked by variation along multiple dimensions, complicating the interpretation and control of learned style tokens. One example is hyperarticulated or 'clear' speech, for example as directed toward listeners with hearing impairments or language learners in the classroom, which in English is characterised by reduced speaking rate, increased F0, more careful articulation of vowels and plosive consonants, and other factors. We present a method for simplifying control of style tokens by applying principal components analysis (PCA) to GST weights from a TTS system trained on both plain and clear speech. We identify the axes of variation in PCA space with the acoustic correlates of clear speech in English and show that we can synthesise either style by moving along a single dimension in that space. Index Terms: controllable speech synthesis, speech style
引用
收藏
页码:3385 / 3389
页数:5
相关论文
共 50 条
  • [1] Shape control synthesis of low-dimensional calcium sulfate
    LI-XIA YANG
    YAN-FENG MENG
    PING YIN
    YING-XIA YANG
    YING-YING TANG
    LAI-FEN QIN
    Bulletin of Materials Science, 2011, 34 : 233 - 237
  • [2] Shape control synthesis of low-dimensional calcium sulfate
    Yang, Li-Xia
    Meng, Yan-Feng
    Yin, Ping
    Yang, Ying-Xia
    Tang, Ying-Ying
    Qin, Lai-Fen
    BULLETIN OF MATERIALS SCIENCE, 2011, 34 (02) : 233 - 237
  • [3] Defects control in the synthesis of low-dimensional zinc oxide
    Takaki, Hidetaka
    Inoue, Shuhei
    Matsumura, Yukihiko
    CHEMICAL PHYSICS LETTERS, 2017, 684 : 113 - 116
  • [4] An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis
    Kwon, Ohsung
    Jang, Inseon
    Ahn, ChungHyun
    Kang, Hong-Goo
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (09) : 1383 - 1387
  • [5] Low-dimensional Space Transforms of Posteriors in Speech Recognition
    Zelinka, Jan
    Trmal, Jan
    Mueller, Ludek
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1193 - 1196
  • [6] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierry
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
  • [7] Synthesis of low-dimensional CuI nanomaterials
    Zhou, HC
    Xu, S
    Li, YD
    CHINESE JOURNAL OF INORGANIC CHEMISTRY, 2003, 19 (06) : 621 - 623
  • [8] Topochemical synthesis of low-dimensional nanomaterials
    Zhang, Qicheng
    Peng, Wenchao
    Li, Yang
    Zhang, Fengbao
    Fan, Xiaobin
    NANOSCALE, 2020, 12 (43) : 21971 - 21987
  • [9] Low-dimensional control: Tonus (1963)
    Meijer, OG
    Kots, YM
    Edgerton, VR
    MOTOR CONTROL, 2001, 5 (01) : 1 - +
  • [10] Low-dimensional representation of spectral envelope using deep auto-encoder for speech synthesis
    Choi, Heejin
    Kim, Jaeseok
    Park, Jinuk
    Kim, Juntae
    Hahn, Minsoo
    ICMSCE 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON MECHATRONICS SYSTEMS AND CONTROL ENGINEERING, 2015, : 107 - 111