CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS

被引:0
|
作者
Sreelekshmi, K. S. [1 ]
Gopinath, Deepa P. [1 ]
机构
[1] Coll Engn, Dept Elect & Commun, Trivandrum 695017, Kerala, India
关键词
Speech synthesis; duration models; cluster analysis; k-means clustering; silhouette plot;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Synthesis of natural sounding speech is the greatest challenge in a Text-to-Speech Synthesis (TTS) system. In natural speech, duration, intensity and pitch are dynamically varied which is manifested as rhythm or prosody of speech. If these variations are not recreated, the synthesized speech will sound robotic. Synthesis of good quality speech depends on how well the duration and intonation patterns are imposed on speech segments. The best way to improve naturalness in speech is to mimic the way human brain imposes rhythm. We speak in a particular style by varying the duration of the speech segments in words and phrases as per certain specific duration patterns. Brain might be retrieving the corresponding patterns at the time of speaking for generating a discourse in a particular style (news reading, bible reading, story telling etc.). The main objective of this work is to investigate the existence of duration patterns in natural speech using cluster analysis. Speech uttered in Malayalam, an Indian language was taken for analysis. Cluster analysis was done on isolated words, as well as on words and phrases in continuous speech. Results of cluster analysis when observed using silhouette plot showed the existence of duration patterns in speech.
引用
收藏
页码:1122 / 1127
页数:6
相关论文
共 50 条
  • [1] ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS
    VANSANTEN, JPH
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 95 - 128
  • [2] CHARACTERIZATION OF RHYTHMIC PATTERNS FOR TEXT-TO-SPEECH SYNTHESIS
    BARBOSA, P
    BAILLY, G
    [J]. SPEECH COMMUNICATION, 1994, 15 (1-2) : 127 - 137
  • [3] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [4] Modeling segmental duration in German text-to-speech synthesis
    Mobius, B
    vanSanten, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2395 - 2398
  • [5] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [6] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [7] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323
  • [8] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [9] Issues in text-to-speech synthesis
    Macchi, M
    [J]. IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 318 - 325
  • [10] Duration analysis for malayalam text-to-speech systems
    Gopinath, Deepa P.
    Divya, Sree J.
    Mathew, Reshmi
    Rekhila, S. J.
    Nair, Achuthsankar S.
    [J]. ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 129 - +