Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis

被引:0
|
作者
Tian, Jilei
Nurminen, Jani
Kiss, Imre
机构
关键词
prosodic modeling; pitch; eigenpitch; text-to-speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In text-to-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllable-based eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.
引用
收藏
页码:313 / 316
页数:4
相关论文
共 50 条
  • [1] A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Ibrahim Baran
    Ilk, Hakki Gokhan
    Yilmaz, Asim Egemen
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2013, 20 (02): : 217 - 223
  • [2] Towards a multilingual prosody model for text-to-speech
    Jokisch, O
    Ding, HW
    Kruschke, H
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 421 - 424
  • [3] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
    Basu, Tulika
    Saha, Arup
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [4] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
    Pan, NH
    Jen, WT
    Yu, SS
    Yu, MS
    Huang, SY
    Wu, MJ
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
  • [5] Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model
    Kenter, Tom
    Sharma, Manish
    Clark, Rob
    [J]. INTERSPEECH 2020, 2020, : 4412 - 4416
  • [6] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [7] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
    O'Mahony, Johannah
    Lai, Catherine
    King, Simon
    [J]. INTERSPEECH 2022, 2022, : 3388 - 3392
  • [8] Dealing with prosody in a text-to-speech system
    Goldsmith J.
    [J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
  • [9] PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH
    Yi, Yuanhao
    He, Lei
    Pan, Shifeng
    Wang, Xi
    Xiao, Yujia
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7582 - 7586
  • [10] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
    Xin, Detai
    Adavanne, Sharath
    Ang, Federico
    Kulkarni, Ashish
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,