Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

被引:0
|
作者
Evrard, Marc [1 ]
Delalez, Samuel [1 ]
d'Alessandro, Christophe [1 ]
Rilliard, Albert [1 ]
机构
[1] LIMSI CNRS, Audio & Acoust Grp, Rue John von Neumann,Campus Univ Orsay,Bat 508, F-91405 Orsay, France
关键词
Calliphony; chironomy; prosody; prosodic synthesis; expressive synthesis; adaptive training; HTS; HMM; VOICE QUALITY; INTONATION; EMOTION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Chironomic stylization is the process of real-time modification of intonation contours (f(o) and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French based on HMM was designed. A neutral corpus and six expressive speech corpora were used (anger, fear, joy, sadness, sensuality, surprise). Five sentences were synthesized with the six types of expressivity through CMLLR adaptation. Using a chironomic system, three trained subjects were asked to modify synthetic sentences, aiming at improving their expressive quality. Natural, HMM-TTS, and HMM-TTS-Chironomic sentences were evaluated in an expressivity recognition test and a MOS test. The results show that chironomic modification brings significant improvements in both recognition and MOS tests. These results are discussed in detail, together with the effects of voice quality on the perception of HMM-TTS expressive speech. The two main conclusions are: (i) intonation of HMM-TTS can be significantly improved; (ii) hand-corrected TTS improves expressivity and overall quality. Chironomic stylization is a powerful tool lying between fully automatic TTS and recorded speech.
引用
收藏
页码:3370 / 3374
页数:5
相关论文
共 50 条
  • [1] Prosody modelling of Spanish for expressive speech synthesis
    Iriondo, Ignasi
    Socoro, Joan Claudi
    Alias, Francesc
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
  • [2] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
    Zhu, Jing
    Yu, Yibiao
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
  • [3] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [4] Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System
    Mac, Dang-Khoa
    Tran, Do-Dat
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2015, 2015, 9441 : 273 - 287
  • [5] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
  • [6] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    [J]. 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
  • [7] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [8] DISCOURSE-LEVEL PROSODY MODELING WITH A VARIATIONAL AUTOENCODER FOR NON-AUTOREGRESSIVE EXPRESSIVE SPEECH SYNTHESIS
    Wu, Ning-Qian
    Liu, Zhao-Ci
    Ling, Zhen-Hua
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7592 - 7596
  • [9] INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
    Cornille, Tobias
    Wang, Fengna
    Bekker, Jessa
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8312 - 8316
  • [10] A statistical approach for modeling prosody features using POS tags for emotional speech synthesis
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1237 - +