Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

被引:0
|
作者
Tu, Tao [1 ]
Chen, Yuan-Jui [1 ]
Liu, Alexander H. [1 ]
Lee, Hung-yi [1 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
来源
关键词
multi-speaker speech synthesis; semi-supervised learning; discrete speech representation;
D O I
10.21437/Interspeech.2020-1824
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, end-to-end multi-speaker text-to-speech (TTS) systems gain success in the situation where a lot of high-quality speech plus their corresponding transcriptions are available. However, laborious paired data collection processes prevent many institutes from building multi-speaker TTS systems of great performance. In this work, we propose a semi-supervised learning approach for multi-speaker TTS. A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation. The experiment results demonstrate that with only an hour of paired speech data, whether the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices. We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy. In addition, our analysis reveals that different speaker characteristics of the paired data have an impact on the effectiveness of semi-supervised TTS.
引用
收藏
页码:3191 / 3195
页数:5
相关论文
共 50 条
  • [21] LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8383 - 8387
  • [22] INVESTIGATING ON INCORPORATING PRETRAINED AND LEARNABLE SPEAKER REPRESENTATIONS FOR MULTI-SPEAKER MULTI-STYLE TEXT-TO-SPEECH
    Chien, Chung-Ming
    Lin, Jheng-Hao
    Huang, Chien-yu
    Hsu, Po-chun
    Lee, Hung-yi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8588 - 8592
  • [23] Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
    Nakai, Yusuke
    Saito, Yuki
    Udagawa, Kenta
    Saruwatari, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 743 - 748
  • [24] Effective Zero-Shot Multi-Speaker Text-to-Speech Technique Using Information Perturbation and a Speaker Encoder
    Bang, Chae-Woon
    Chun, Chanjun
    SENSORS, 2023, 23 (23)
  • [25] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    INTERSPEECH 2020, 2020, : 4024 - 4028
  • [26] MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 318 - 329
  • [27] ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Fang, Fuming
    Wang, Xin
    Chen, Nanxin
    Yamagishi, Junichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6184 - 6188
  • [28] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
  • [29] Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers
    Hsieh, Cheng-Ping
    Ghosh, Subhankar
    Ginsburg, Boris
    INTERSPEECH 2023, 2023, : 3028 - 3032
  • [30] Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 456 - 460