An LSTM-based model for the compression of acoustic inventories for corpus-based text-to-speech synthesis systems

被引:5
|
作者
Rojc, Matej [1 ]
Mlakar, Izidor [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, Maribor, Slovenia
基金
欧盟地平线“2020”;
关键词
Concatenation costs; LSTM; Unit selection; Cost function; Corpus-based text-to-speech synthesis; Acoustic inventory optimisation;
D O I
10.1016/j.compeleceng.2022.107942
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
ABSTR A C T Large acoustic inventories must be used to produce speech close to natural quality. However, the concatenation cost space grows exponentially with the number of acoustic units in the acoustic inventory, increasing the latency of the unit selection algorithm, making algorithms unusable in real-time end-to-end systems. Even when data compression techniques are introduced, the model size is still high, representing a challenge for end-to-end systems. Thus, in this paper, we propose representing the concatenation cost space using LSTM (Long Short-Term Memory). The results show a 90% reduction in the size of the data space compared to all our previous techniques, and by an over 70% decrease in the look-up time. The proposed LSTM-based compression increases the responsiveness of the corpus-based text-to-speech systems significantly while keeping the overall speech quality at the same level.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    [J]. 2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [2] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
    Chou, FC
    Tseng, CY
    Lee, LS
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
  • [3] A new Korean corpus-based text-to-speech system
    Kim S.
    Lee Y.
    Hirose K.
    [J]. International Journal of Speech Technology, 2002, 5 (2) : 105 - 116
  • [4] An objective measure for assement of a corpus-based text-to-speech system
    Xu, J
    Guan, CT
    Li, HZ
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 179 - 182
  • [5] Time and space-efficient architecture for a corpus-based text-to-speech synthesis system
    Rojc, Matej
    Kacic, Zdravko
    [J]. SPEECH COMMUNICATION, 2007, 49 (03) : 230 - 249
  • [6] Unit generation based on phrase break strength and pruning for corpus-based text-to-speech
    Kim, S
    Lee, Y
    Hirose, K
    [J]. ETRI JOURNAL, 2001, 23 (04) : 168 - 176
  • [7] LSTM-Based Kazakh Speech Synthesis
    Kaliyev, Arman
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 201 - 208
  • [8] GRADIENT-DESCENT BASED UNIT-SELECTION OPTIMIZATION ALGORITHM USED FOR CORPUS-BASED TEXT-TO-SPEECH SYNTHESIS
    Rojc, Matej
    Kacic, Zdravko
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2011, 25 (07) : 635 - 668
  • [9] A study of prosodic variability methods in a corpus-based unit selection text-to-speech system
    Csapo, Tamas Gabor
    Zainko, Csaba
    Nemeth, Geza
    [J]. INFOCOMMUNICATIONS JOURNAL, 2010, 2 (01): : 32 - 37
  • [10] LSTM-Based Speech Segmentation for TTS Synthesis
    Hanzlicek, Zdenek
    Vit, Jakub
    Tihelka, Daniel
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 361 - 372