Conditional Random Fields for Hierarchical Segment Selection in Text-to-Speech Synthesis

被引：0

作者：

Weiss, Christian ^{[1
]}

Hess, Wolfgang ^{[1
]}

机构：

[1] Univ Bonn, Inst Commun Res, D-5300 Bonn, Germany

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

Speech Synthesis; Unit Selection; CRF;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we present the statistically motivated conditional random fields (CRF) approach to concatenative TTS. We use contextual CRFs for speech segment selection where we concatenate the selected segments to an acoustic speech waveform. The CRF approach is used in our corpus-based TTS system AVISS. The acoustic synthesis module consists of trained context dependent CRF models on a multi-level acoustic unit inventory where we apply a hierarchical top-down search to select appropriate segments. The acoustic synthesis is easily adaptable to other languages while there is only the need of a language specific module for text and symbolic preprocessing as well as duration and F0 prediction which can be performed by a prosodic module. The system shows good results in the generated speech waveforms. The CRF approach is usable for acoustic units as well as a parametric synthesis where the speech parameters are generated by CRFs and the speech waveform is produced by a synthesis filter.

引用

页码：2026 / 2029

页数：4

共 50 条

[1] Speaker Specific Phrase Break Modeling with Conditional Random Fields for Text-to-Speech
Louw, Johannes A.
Moodley, Avashlin
[J]. 2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
[2] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
[3] Efficient Unit-Selection in Text-to-Speech Synthesis
Mihelic, Ales
Gros, Jerneja Zganec
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 411 - 418
[4] SELECTION OF A FORMANT SYNTHESIZER MODEL FOR TEXT-TO-SPEECH SYNTHESIS
SINCLAIR, DA
[J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 363 - 369
[5] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
Doukhan, David
Rosset, Sophie
Rilliard, Albert
d'Alessandro, Christophe
Adda-Decker, Martine
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
[6] Embedded Unit Selection Text-to-Speech Synthesis for Mobile Devices
Karabetsos, Sotiris
Tsiakoulis, Pirros
Chalamandaris, Aimilios
Raptis, Spyros
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (02) : 613 - 621
[7] An Overview of the ILSP Unit Selection Text-to-Speech Synthesis System
Tsiakoulis, Pirros
Karabetsos, Sotiris
Chalamandaris, Aimilios
Raptis, Spyros
[J]. ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 370 - 383
[8] Continuity Metric for Unit Selection based Text-to-Speech Synthesis
Lakkavalli, Vikram Ramesh
Arulmozhi, P.
Ramakrishnan, A. G.
[J]. 2010 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2010,
[9] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
[10] Improving text-to-speech synthesis
Tatham, M
Lewis, E
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859

← 1 2 3 4 5 →