Novel Eigenpitch-based Prosody Model for Text-to-Speech Synthesis

被引：0

作者：

Tian, Jilei

Nurminen, Jani

Kiss, Imre

机构：

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

prosodic modeling; pitch; eigenpitch; text-to-speech;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Prosody is an inherent supra-segmental feature in speech that human speakers employ to express, for example, attitude, emotion, intent and attention. In text-to-speech (TTS) systems, high naturalness can only be achieved if the prosody of the output is appropriate. The importance of prosody is even more crucial for tonal languages, such as Mandarin Chinese, in which the tone of each syllable is described by its pitch contour. In this paper, we propose a novel prosody modeling approach that uses the concept of syllable-based eigenpitch. The approach has been implemented in our Mandarin TTS system resulting in less than 0.1% error variance. The results obtained in practical experiments have confirmed the good performance of the proposed technique.

引用

页码：313 / 316

页数：4

共 50 条

[1] A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS
Uslu, Ibrahim Baran
Ilk, Hakki Gokhan
Yilmaz, Asim Egemen
[J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2013, 20 (02): : 217 - 223
[2] Towards a multilingual prosody model for text-to-speech
Jokisch, O
Ding, HW
Kruschke, H
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 421 - 424
[3] Evaluation of Prosody in Text-to-Speech Synthesis System of Bangla
Basu, Tulika
Saha, Arup
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[4] Prosody model in a Mandarin Text-to-Speech System based on a hierarchical approach
Pan, NH
Jen, WT
Yu, SS
Yu, MS
Huang, SY
Wu, MJ
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 448 - 451
[5] Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model
Kenter, Tom
Sharma, Manish
Clark, Rob
[J]. INTERSPEECH 2020, 2020, : 4412 - 4416
[6] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
Anil, Manjare Chandraprabha
Shirbahadurkar, S. D.
[J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
[7] Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis
O'Mahony, Johannah
Lai, Catherine
King, Simon
[J]. INTERSPEECH 2022, 2022, : 3388 - 3392
[8] Dealing with prosody in a text-to-speech system
Goldsmith J.
[J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
[9] PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH
Yi, Yuanhao
He, Lei
Pan, Shifeng
Wang, Xi
Xiao, Yujia
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7582 - 7586
[10] Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts
Xin, Detai
Adavanne, Sharath
Ang, Federico
Kulkarni, Ashish
Takamichi, Shinnosuke
Saruwatari, Hiroshi
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,

← 1 2 3 4 5 →