Time and space-efficient architecture for a corpus-based text-to-speech synthesis system

被引：17

作者：

Rojc, Matej ^{[1
]}

Kacic, Zdravko ^{[1
]}

机构：

[1] Univ Maribor, Fac Elect Engn & Comp Sci, SLO-2000 Maribor, Slovenia

来源：

SPEECH COMMUNICATION | 2007年 / 49卷 / 03期

关键词：

corpus-based text-to-speech system; finite-state machines (FSM); heterogeneous-relation graphs (HRG); queuing mechanism;

D O I：

10.1016/j.specom.2007.01.007

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a time and space-efficient architecture for a text-to-speech synthesis system (TTS). The proposed architecture can be efficiently used in those applications with unlimited domain, requiring multilingual or polyglot functionality. The integration of a queuing mechanism, heterogeneous graphs and finite-state machines gives a powerful, reliable and easily maintainable architecture for the TTS system. Flexible and language-independent framework efficiently integrates all those algorithms used within the scope of the TTS system. Heterogeneous relation graphs are used for linguistic information representation and feature construction. Finite-state machines are used for time and space-efficient representation of language resources, for time and space-efficient lookup processes, and the separation of language-dependent resources from a language-independent TTS engine. Its queuing mechanism consists of several dequeue data structures and is responsible for the activation of all those TTS engine modules having to process the input text. In the proposed architecture, all modules use the same data structure for gathering linguistic information about input text. All input and output formats are compatible, the structure is modular and interchangeable, it is easily maintainable and object oriented. The proposed architecture was successfully used when implementing the Slovenian PLATTOS corpus-based TTS system, as presented in this paper. (c) 2007 Elsevier B.V. All rights reserved.

引用

页码：230 / 249

页数：20

共 50 条

[1] Corpus-based Malay Text-to-Speech Synthesis System
Swee, Tan Tian
Salleh, Sheikh Hussain Shaikh
[J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
[2] A new Korean corpus-based text-to-speech system
Kim S.
Lee Y.
Hirose K.
[J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
[3] An objective measure for assement of a corpus-based text-to-speech system
Xu, J
Guan, CT
Li, HZ
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 179 - 182
[4] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
Chou, FC
Tseng, CY
Lee, LS
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
[5] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
Csapo, Tamas Gabor
Zainko, Csaba
Nemeth, Geza
[J]. INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
[6] An LSTM-based model for the compression of acoustic inventories for corpus-based text-to-speech synthesis systems
Rojc, Matej
Mlakar, Izidor
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
[7] Unit generation based on phrase break strength and pruning for corpus-based text-to-speech
Kim, S
Lee, Y
Hirose, K
[J]. ETRI JOURNAL, 2001, 23 (04) : 168 - 176
[8] An efficient Mandarin text-to-speech system on time domain
Lin, YJ
Yu, MS
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1998, E81D (06): : 545 - 555
[9] RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Zandie, Rohola
Mahoor, Mohammad H.
Madsen, Julia
Emamian, Eshrat S.
[J]. INTERSPEECH 2021, 2021, : 2751 - 2755
[10] A corpus-based speech synthesis system with emotion
Iida, A
Campbell, N
Higuchi, F
Yasumura, M
[J]. SPEECH COMMUNICATION, 2003, 40 (1-2) : 161 - 187

← 1 2 3 4 5 →