A waveform concatenation technique for text-to-speech synthesis

被引：7

作者：

Panda S.P. ^{[1
]}

Nayak A.K. ^{[2
]}

机构：

[1] Department of CSE, Silicon Institute of Technology, Bhubaneswar, Odisha

[2] Department of CS&IT, Siksha ‘O’ Anusandhan University, Bhubaneswar, Odisha

来源：

International Journal of Speech Technology | 2017年 / 20卷 / 4期

关键词：

Concatenative technique; Indian languages; Speech synthesis; Text-to-speech system; Waveform concatenation;

D O I：

10.1007/s10772-017-9463-8

中图分类号：

学科分类号：

摘要：

Designing text-to-speech systems capable of producing natural sounding speech segments in different Indian languages is a challenging and ongoing problem. Due to the large number of possible pronunciations in different Indian languages, a number of speech segments are needed to be stored in the speech database while a concatenative speech synthesis technique is used to achieve highly natural speech segments. However, the large speech database size makes it unusable for small hand held devices or human computer interactive systems with limited storage resources. In this paper, we proposed a fraction-based waveform concatenation technique to produce intelligible speech segments from a small footprint speech database. The results of all the experiments performed shows the effectiveness of the proposed technique in producing intelligible speech segments in different Indian languages even with very less storage and computation overhead compared to the existing syllable-based technique. © 2017, Springer Science+Business Media, LLC.

引用

页码：959 / 976

页数：17

共 50 条

[1] Integration of Fuzzy If-Then Rule with Waveform Concatenation Technique for Text-to-Speech Synthesis in Odia
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
[J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT), 2014, : 88 - 93
[2] Spectral Smoothening Based Waveform Concatenation Technique for Speech Quality Enhancement in Text-to-Speech Systems
Panda, Soumya Priyadarsini
Nayak, Ajit Kumar
[J]. ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 425 - 432
[3] Integration of rule-based formant synthesis and waveform concatenation: A hybrid approach to text-to-speech synthesis
Hertz, SR
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 87 - 90
[4] Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems
Panasonic Technologies, Inc, Santa Barbara, United States
[J]. Int J Speech Technol, 2 (103-107):
[5] Combining concatenation and formant synthesis for improved intelligibility and naturalness in text-to-speech systems
Pearson S.
[J]. International Journal of Speech Technology, 1997, 1 (2) : 103 - 107
[6] Indonesian Text-To-Speech System Using Syllable Concatenation: Speech Optimization
Mengko, Richard
Ayuningtyas, Aulia
[J]. PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 412 - 415
[7] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
[8] EXAMPLAR-BASED SPEECH WAVEFORM GENERATION FOR TEXT-TO-SPEECH
Valentini-Botinhao, Cassia
Watts, Oliver
Espic, Felipe
King, Simon
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 332 - 338
[9] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
Doukhan, David
Rosset, Sophie
Rilliard, Albert
d'Alessandro, Christophe
Adda-Decker, Martine
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
[10] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764

← 1 2 3 4 5 →