Advancements in Expressive Speech Synthesis: a Review

被引:0
|
作者
Alwaisi, Shaimaa [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Fac Elect Engn & Informat, Dept Telecommun & Media Informat, Budapest, Hungary
来源
INFOCOMMUNICATIONS JOURNAL | 2024年 / 16卷 / 01期
关键词
Speech style; Expressivity; Emotional speech; Expressive TTS; Prosody modification; Multi- lingual and multi- speaker TTS; SPEAKER ADAPTATION; VOICE CONVERSION; TEXT; MODEL; TTS;
D O I
10.36244/ICJ.2024.1.5
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In recent years, we have witnessed a fast and wide spread acceptance of speech sinthesis technology in, leading to the transition toward a society characterized by a strong desire to incorporate these applications in their daily lives. We provide a comprehensive survey on the recent advancements in the field of expressive Text-To-Speech systems. Among different methods to represent expressivity, this paper facucesthe developmentofax pressive TTS systems, emphasizing the methodologies employed to enhance the quality and expressiveness of synthetic speech, such as style transfer and improving speaker variability. After that, we point out some of the subjective and objective metrics that are used to evaluate the quality of synthesized speech. Fi- nally, we point out the realm of child speech synthesis, a domain that has been neglected for some time. This underscores that the field of research in children's speech synthesis is still wide open for exploration and development. Overall, this paper presents a comprehensive overview of historical and contemporary trends and future directions in speech synthesis research.
引用
收藏
页码:35 / 46
页数:12
相关论文
共 50 条
  • [31] A framework towards expressive speech analysis and synthesis with preliminary results
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    [J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2015, 9 (04) : 387 - 394
  • [32] What type of inputs will we need for expressive speech synthesis?
    Campbell, N
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 95 - 98
  • [33] Rigid head motion in expressive speech animation: Analysis and synthesis
    Busso, Carlos
    Deng, Zhigang
    Grimm, Michael
    Neumann, Ulrich
    Narayanan, Shrikanth
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1075 - 1086
  • [34] Limited domain synthesis of expressive military speech for animated characters
    Johnson, WL
    Narayanan, S
    Whitney, R
    Das, R
    Bulut, M
    LaBore, C
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 163 - 166
  • [35] Pitch Contour Modelling and Modification for Expressive Marathi Speech Synthesis
    Deo, Rohit S.
    Deshpande, Pallavi S.
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2455 - 2458
  • [36] Expressive Speech Synthesis using Prosodic Modification for Marathi Language
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 126 - 130
  • [37] Contribution to the Design of an Expressive Speech Synthesis System for the Arabic Language
    Demri, Lyes
    Falek, Leila
    Teffahi, Hocine
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 178 - 185
  • [38] Can We Generate Emotional Pronunciations for Expressive Speech Synthesis?
    Tahon, Marie
    Lecorve, Gwenole
    Lolive, Damien
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2020, 11 (04) : 684 - 695
  • [39] Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
    Akuzawa, Kei
    Iwasawa, Yusuke
    Matsuo, Yutaka
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3067 - 3071
  • [40] EMPHATIC SPEECH GENERATION WITH CONDITIONED INPUT LAYER AND BIDIRECTIONAL LSTMS FOR EXPRESSIVE SPEECH SYNTHESIS
    Li, Runnan
    Wu, Zhiyong
    Huang, Yuchen
    Jia, Jia
    Meng, Helen
    Cai, Lianhong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5129 - 5133