Modeling segmental duration for Turkish text-to-speech

被引:1
|
作者
Öztürk, Ö [1 ]
Çiloglu, T [1 ]
机构
[1] Dokuz Eylul Univ, Elekt Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
D O I
10.1109/SIU.2004.1338312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Speech (TTS) synthesis can be regarded as the automatic transformation of sentences from their text form into their speech waveform by machines. The most crucial problem confronting TTS systems is the generation of natural sounding voice. In order to obtain natural sounding synthetic speech, prosodic attributes of speech such as pitch frequency, duration and intensity should be modelled appropriately. This paper summarizes the efforts to obtain duration models to be utilized in Turkish TTS systems via machine-learning algorithms. In natural speech, segment durations are highly correlated to context. Similar/same phones differ from each other in their energy, duration and fundamental frequency depending on their context. To obtain natural speech thru TTS, prosodic variations due to context should be modeled. Different methods of modeling duration have been applied over the years. Two corpus-based statistical systems - Linear regression and C4.5 decision tree - are employed in modeling segment durations in Turkish.
引用
收藏
页码:272 / 275
页数:4
相关论文
共 50 条
  • [1] Modeling segmental duration in German text-to-speech synthesis
    Mobius, B
    vanSanten, J
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2395 - 2398
  • [2] ASSIGNMENT OF SEGMENTAL DURATION IN TEXT-TO-SPEECH SYNTHESIS
    VANSANTEN, JPH
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (02): : 95 - 128
  • [3] TTTS: TURKISH TEXT-TO-SPEECH SYSTEM
    Gormez, Zeliha
    Orhan, Zeynep
    [J]. PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTERS , PTS 1-3: NEW ASPECTS OF COMPUTERS, 2008, : 977 - +
  • [4] A prosodic Turkish text-to-speech synthesizer
    Vural, E
    Oflazer, K
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 458 - 460
  • [5] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    [J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
  • [6] Evaluation of The Concatenative Turkish Text-to-Speech System
    Orhan, Zeynep
    Gormez, Zeliha
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4314 - +
  • [7] Segmental Duration Modeling in Turkish
    Ozturk, Ozlem
    Ciloglu, Tolga
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2378 - +
  • [8] Duration analysis for malayalam text-to-speech systems
    Gopinath, Deepa P.
    Divya, Sree J.
    Mathew, Reshmi
    Rekhila, S. J.
    Nair, Achuthsankar S.
    [J]. ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 129 - +
  • [9] FUJISAKI INTONATION MODEL IN TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Baran
    Ilk, H. Goekhan
    [J]. 2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 133 - 136
  • [10] The pause duration prediction for mandarin text-to-speech system
    Yu, J
    Tao, JH
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 204 - 208