An LSTM-based model for the compression of acoustic inventories for corpus-based text-to-speech synthesis systems

被引：5

作者：

Rojc, Matej ^{[1
]}

Mlakar, Izidor ^{[1
]}

机构：

[1] Univ Maribor, Fac Elect Engn & Comp Sci, Maribor, Slovenia

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 100卷

基金：

欧盟地平线“2020”;

关键词：

Concatenation costs; LSTM; Unit selection; Cost function; Corpus-based text-to-speech synthesis; Acoustic inventory optimisation;

D O I：

10.1016/j.compeleceng.2022.107942

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

ABSTR A C T Large acoustic inventories must be used to produce speech close to natural quality. However, the concatenation cost space grows exponentially with the number of acoustic units in the acoustic inventory, increasing the latency of the unit selection algorithm, making algorithms unusable in real-time end-to-end systems. Even when data compression techniques are introduced, the model size is still high, representing a challenge for end-to-end systems. Thus, in this paper, we propose representing the concatenation cost space using LSTM (Long Short-Term Memory). The results show a 90% reduction in the size of the data space compared to all our previous techniques, and by an over 70% decrease in the look-up time. The proposed LSTM-based compression increases the responsiveness of the corpus-based text-to-speech systems significantly while keeping the overall speech quality at the same level.

引用

页数：10

共 50 条

[1] Corpus-based Malay Text-to-Speech Synthesis System
Swee, Tan Tian
Salleh, Sheikh Hussain Shaikh
[J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
[2] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
Chou, FC
Tseng, CY
Lee, LS
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
[3] A new Korean corpus-based text-to-speech system
Kim S.
Lee Y.
Hirose K.
[J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
[4] An objective measure for assement of a corpus-based text-to-speech system
Xu, J
Guan, CT
Li, HZ
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 179 - 182
[5] Time and space-efficient architecture for a corpus-based text-to-speech synthesis system
Rojc, Matej
Kacic, Zdravko
[J]. SPEECH COMMUNICATION, 2007, 49 (03) : 230 - 249
[6] Unit generation based on phrase break strength and pruning for corpus-based text-to-speech
Kim, S
Lee, Y
Hirose, K
[J]. ETRI JOURNAL, 2001, 23 (04) : 168 - 176
[7] LSTM-Based Kazakh Speech Synthesis
Kaliyev, Arman
[J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 201 - 208
[8] GRADIENT-DESCENT BASED UNIT-SELECTION OPTIMIZATION ALGORITHM USED FOR CORPUS-BASED TEXT-TO-SPEECH SYNTHESIS
Rojc, Matej
Kacic, Zdravko
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2011, 25 (07) : 635 - 668
[9] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
Csapo, Tamas Gabor
Zainko, Csaba
Nemeth, Geza
[J]. INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
[10] LSTM-Based Speech Segmentation for TTS Synthesis
Hanzlicek, Zdenek
Vit, Jakub
Tihelka, Daniel
[J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 361 - 372

← 1 2 3 4 5 →