A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

被引:1
|
作者
Cheon, Sung Jun [1 ,2 ]
Choi, Byoung Jin [1 ,2 ]
Kim, Minchan [1 ,2 ]
Lee, Hyeonseung [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Training; Upper bound; Speech synthesis; Correlation; Mutual information; Synthesizers; Estimation; Disentanglement; mutual information; speech synthesis; style modeling; total correlation;
D O I
10.1109/LSP.2021.3125259
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, we propose a multivariate information minimization method that disentangles three or more latent representations. We show that control factors can be disentangled by minimizing interactive dependency, which can be expressed as a sum of mutual information upper bound terms. Since the upper bound estimate converges from the early training stage, there is little performance degradation due to auxiliary loss. The proposed technique is applied to train a text-to-speech synthesizer with multi-lingual, multi-speaker, and multi-style corpora. Subjective listening tests validate that the proposed method can improve the synthesizer in terms of quality as well as controllability.
引用
收藏
页码:55 / 59
页数:5
相关论文
共 50 条
  • [21] THE HUYA MULTI-SPEAKER AND MULTI-STYLE SPEECH SYNTHESIS SYSTEM FOR M2VOC CHALLENGE 2020
    Wang, Jie
    You, Yuren
    Liu, Feng
    Tuo, Deyi
    Kang, Shiyin
    Wu, Zhiyong
    Meng, Helen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8608 - 8612
  • [22] Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
    Nakai, Yusuke
    Saito, Yuki
    Udagawa, Kenta
    Saruwatari, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 743 - 748
  • [23] An emotional speech synthesis markup language processor for multi-speaker and emotional text-to-speech applications
    Ryu, Se-Hui
    Cho, Hee
    Lee, Ju-Hyun
    Hong, Ki-Hyung
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 523 - 529
  • [24] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506
  • [25] LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
    Koizumi, Yuma
    Zen, Heiga
    Karita, Shigeki
    Ding, Yifan
    Yatabe, Kohei
    Morioka, Nobuyuki
    Bacchiani, Michiel
    Zhang, Yu
    Han, Wei
    Bapna, Ankur
    INTERSPEECH 2023, 2023, : 5496 - 5500
  • [26] LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
    Udupa, Sathvik
    Bandekar, Jesuraja
    Singh, Abhayjeet
    Deekshitha, G.
    Kumar, Saurabh
    Badiger, Sandhya
    Nagireddi, Amala
    Roopa, R.
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Kumar, Pranaw
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 293 - 302
  • [27] Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
    Min, Dongchan
    Lee, Dong Bok
    Yang, Eunho
    Hwang, Sung Ju
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Effective Zero-Shot Multi-Speaker Text-to-Speech Technique Using Information Perturbation and a Speaker Encoder
    Bang, Chae-Woon
    Chun, Chanjun
    SENSORS, 2023, 23 (23)
  • [29] Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
    Tu, Tao
    Chen, Yuan-Jui
    Liu, Alexander H.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3191 - 3195
  • [30] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
    Luong, Hieu-Thi
    Wang, Xin
    Yamagishi, Junichi
    Nishizawa, Nobuyuki
    INTERSPEECH 2019, 2019, : 1303 - 1307