Increasing Prosodic Variability of Text-To-Speech Synthesizers

被引:0
|
作者
Nemeth, Geza [1 ]
Fek, Mark [1 ]
Csapo, Tamas Gabor [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
关键词
speech synthesis; prosodic variability; F-0; variation; transplantation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lack of prosody variation in text-to-speech systems contributes to their perceived unnaturalness when synthesizing extended passages. In this paper, we present a method to improve prosody generation in this direction. A database of natural sample sentences is searched for sentences having similar word and syllable structure to the input. One sentence is selected randomly from the similar sentences found. The prosody of the randomly selected natural sentence is used as a target to generate the prosody of the synthetic one. An experiment was conducted to determine the potential of the proposed method. The rule-based pitch contour generation of a Hungarian concatenative synthesizer was replaced by a semi-automatic implementation of the proposed method. A listening test showed that subjects preferred sentences synthesized by the proposed method over a rule-based solution.
引用
收藏
页码:1981 / 1984
页数:4
相关论文
共 50 条
  • [1] A prosodic Turkish text-to-speech synthesizer
    Vural, E
    Oflazer, K
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
  • [2] CONTROLLING PHONEME SYNTHESIZERS IN TEXT-TO-SPEECH SYSTEMS
    RUHL, HW
    DREISSIG, D
    KULAS, W
    [J]. NTZ ARCHIV, 1984, 6 (10): : 243 - 248
  • [3] A prosodic model for text-to-speech synthesis in French
    Di Cristo, A
    Di Cristo, P
    Campione, E
    Véronis, J
    [J]. INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 321 - 355
  • [4] A Prosodic Text-to-Speech System for Yoruba Language
    Akinwonmi, Akintoba Emmanuel
    Alese, Boniface Kayode
    [J]. 2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 630 - 635
  • [5] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
    Babianski, Mikolaj
    Pokora, Kamil
    Shah, Raahil
    Sienkiewicz, Rafal
    Korzekwa, Daniel
    Klimkov, Viacheslav
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
  • [6] Prosodic Annotation in a Thai Text-to-speech System
    Potisuk, Siripong
    [J]. PACLIC 21: THE 21ST PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, PROCEEDINGS, 2007, : 405 - 414
  • [7] Evaluating Arabic Text-To-Speech Synthesizers for Mobile Phones
    AlRouqi, Hend
    Alhadhrami, Suheer
    Al-Khalifa, Hend S.
    Al-Salman, AbdulMalik S.
    Alarifi, Abdulrahman
    Alnafessah, Ahmad
    Al-Ammar, Mai A.
    [J]. 2015 Tenth International Conference on Digital Information Management (ICDIM), 2015, : 41 - 46
  • [8] Speech synthesis for text-to-speech alignment and prosodic feature extraction
    Malfrere, F
    Dutoit, T
    [J]. ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2637 - 2640
  • [9] Prosodic boundary prediction model for Vietnamese text-to-speech
    Trang, Nguyen Thi Thu
    Ky, Nguyen Hoang
    Rilliard, Albert
    D'Alessandro, Christophe
    [J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3366 - 3370
  • [10] Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
    Nguyen Thi Thu Trang
    Nguyen Hoang Ky
    Rilliard, Albert
    d'Alessandro, Christophe
    [J]. INTERSPEECH 2021, 2021, : 3885 - 3889