A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

被引:1
|
作者
Freixes, Marc [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] La Salle Univ Ramon Llull, Grup Recerca Tecnol Media GTM, Quatre Camins 30, Barcelona 08022, Spain
关键词
Text-to-speech; Unit selection; Speech synthesis; Singing synthesis; Speech-to-singing; VOICE SYNTHESIS SYSTEM; PLUS NOISE MODEL; QUALITY;
D O I
10.1186/s13636-019-0163-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original speaker is unavailable or unable to sing properly. This work introduces a unit selection-based text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, time-scale factors are not reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US configurations obtain similar naturalness rates of around 40 for all the analysed scenarios. Although these naturalness scores are far from those of vocaloid, the singing scores of around 60 which were obtained validate that the framework could reasonably address eventual singing needs.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] On the Impact of Labialization Contexts on Unit Selection Speech Synthesis
    Tihelka, Daniel
    Hanzlicek, Zdenek
    Machac, Pavel
    Skarnitzl, Radek
    Matousek, Jindrich
    2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 187 - 192
  • [42] Towards Intonation Control in Unit Selection Speech Synthesis
    Boidin, Cedric
    Boeffard, Olivier
    Moudenc, Thierry
    Damnati, Geraldine
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 736 - +
  • [43] On the Role of Spectral Dynamics in Unit Selection Speech Synthesis
    Kirkpatrick, Barry
    O'Brien, Darragh
    Scaife, Ronan
    Errity, Andrew
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2029 - 2032
  • [44] Joint Prosodic and Segmental Unit Selection Speech Synthesis
    Clark, Robert A. J.
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1312 - 1315
  • [45] The Target Cost Formulation in Unit Selection Speech Synthesis
    Taylor, Paul
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2038 - 2041
  • [46] Quality deterioration factors in unit selection speech synthesis
    Tihelka, Daniel
    Matousek, Jindfich
    Kala, Jiri
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 508 - 515
  • [47] COMPRESSED SENSING FOR UNIT SELECTION BASED SPEECH SYNTHESIS
    Sharma, Pulkit
    Abrol, Vinayak
    Sao, Anil Kumar
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1731 - 1735
  • [48] OPTIMIZATION OF COST FUNCTION WEIGHTS FOR UNIT SELECTION SPEECH SYNTHESIS USING SPEECH RECOGNITION
    Pobar, Miran
    Martincic-Ipsic, Sanda
    Ipsic, Ivo
    NEURAL NETWORK WORLD, 2012, 22 (05) : 429 - 441
  • [49] Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
    Barra-Chicote, Roberto
    Yamagishi, Junichi
    King, Simon
    Manuel Montero, Juan
    Macias-Guarasa, Javier
    SPEECH COMMUNICATION, 2010, 52 (05) : 394 - 404
  • [50] Slovak speech database for experiments and application building in unit-selection speech synthesis
    Rusko, M
    Trnka, M
    Darzágín, S
    Cernak, M
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 457 - 464