Hierarchical Transfer Learning for Text-to-Speech in Indonesian, Java']Javanese, and Sundanese Languages

被引:0
|
作者
Azizah, Kurniawati [1 ]
Adriani, Mirna [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
关键词
deep learning; hierarchical transfer learning; low-resource problem; Indonesian; !text type='Java']Java[!/text]nese; Sundanese; text-to-speech; ALGORITHMS;
D O I
10.1109/icacsis51025.2020.9263086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research develops end-to-end deep learning-based text-to-speech (TTS) in Indonesian, Javanese, and Sundanese. While end-to-end neural TTS, such as Tacotron-2, has made remarkable progress recently, it still suffers from a data scarcity problem for low-resource languages such as Javanese and Sundanese. Our preliminary study shows that Tacotron-2-based TTS needs a large amount of training data; a minimum of 10 hours of training data is required for the model to be able to synthesize acceptable quality and intelligible speech. To solve this low-resource problem, our work proposes a hierarchical transfer learning to train TTS for Javanese and Sundanese, by taking advantage of a dissimilar high-resource language of English domain and a similar intermediate-resource language of Indonesian domain. We report that the evaluation of synthesized speech using the mean opinion score (MOS) reaches 4.27 for Indonesian, and 4.08 for Javanese, and 3.92 for Sundanese. The word accuracy (WAcc) evaluation on semantically unpredicted sentences (SUS) reaches 98.26% for Indonesian, 95.02% for Javanese, and 95.43% for Sundanese. The subjective evaluations of the synthetic speech quality demonstrate that our transfer learning scheme is successfully applied to TTS model for low-resource target domain. Using less than one hour of training data, 38 minutes for Indonesian, 16 minutes for Javanese, and 19 minutes for Sundanese, TTS models can learn fast and achieve adequate performance.
引用
收藏
页码:421 / 428
页数:8
相关论文
共 50 条
  • [1] Building Open Java']Javanese and Sundanese Corpora for Multilingual Text-to-Speech
    Wibawa, Jaka Aris Eko
    Sarin, Supheakmungkol
    Li, Chenfang
    Pipatsrisawat, Knot
    Sodimana, Keshan
    Kjartansson, Oddur
    Gutkin, Alexander
    Jansche, Martin
    Ha, Linne
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1610 - 1614
  • [2] Adding an Emotions Filter to Java']Javanese Text-to-Speech System
    Mulyanto, Edy
    Yuniarno, Eko Mulyanto
    Purnomo, Mauridhi Hery
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, NETWORK AND INTELLIGENT MULTIMEDIA (CENIM), 2018, : 142 - 146
  • [3] INDONESIAN AND JAVA']JAVANESE SPEECH LEVELS
    POEDJOSOEDARMO, S
    [J]. ASIAN PERSPECTIVES, 1975, 18 (01) : 90 - 90
  • [4] THE INDONESIAN VOWELS AS PRONOUNCED AND PERCEIVED BY TOBA BATAK, SUNDANESE AND JAVA']JAVANESE SPEAKERS
    VANZANTEN, E
    VANHEUVEN, VJ
    [J]. BIJDRAGEN TOT DE TAAL- LAND- EN VOLKENKUNDE, 1984, 140 (04): : 497 - 521
  • [5] Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Java']Javanese Languages
    Safitri, Nur Endah
    Zahra, Amalia
    Adriani, Mirna
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 182 - 187
  • [6] Contrastive Analysis of Indonesian and Java']Javanese Languages and Its Prospective Implication for Language Learning
    Sulistiyono, Roni
    Rahayu, Triwati
    Suyata, Pujiati
    [J]. PROCEEDINGS OF THE 1ST YOGYAKARTA INTERNATIONAL CONFERENCE ON EDUCATIONAL MANAGEMENT/ADMINISTRATION AND PEDAGOGY (YICEMAP 2017), 2017, 66 : 165 - 168
  • [7] Shifting languages: Interaction and identity in Java']Javanese Indonesian.
    Woolard, KA
    [J]. LANGUAGE IN SOCIETY, 2000, 29 (03) : 456 - 460
  • [8] IndicSpeech: Text-to-Speech Corpus for Indian Languages
    Srivastava, Nimisha
    Mukhopadhyay, Rudrabha
    Prajwal, K. R.
    Jawahar, C., V
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6417 - 6422
  • [9] BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY
    Sitaram, Sunayana
    Palkar, Sukhada
    Chen, Yun-Nung
    Parlikar, Alok
    Black, Alan W.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7992 - 7996
  • [10] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    [J]. INTERSPEECH 2019, 2019, : 2833 - 2837