Zero-shot multi-speaker accent TTS with limited accent data

被引:0
|
作者
Zhang, Mingyang [1 ]
Zhou, Yi [2 ]
Wu, Zhizheng [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, Shenzhen, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
SPEAKER ADAPTATION;
D O I
10.1109/APSIPAASC58517.2023.10317526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a multi-speaker accent speech synthesis framework. It can generate accented speech of unseen speakers using only a limited amount of accent training data. Without relying on the accent lexicon, the proposed network is able to learn the accent phoneme embedding via a simple model adaptation. In specific, a standard multi-speaker speech synthesis is first trained with native speech. Then, an additional neural network module is appended for adaptation to map the native speech to the accented speech. In the experiments, we have synthesized English speech with Singapore and Hindi accents. Both objective and subjective evaluation results successfully confirm that our proposed technique with phoneme mapping is effective to generate high-quality accent speech for unseen speakers.
引用
收藏
页码:1931 / 1936
页数:6
相关论文
共 50 条
  • [21] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [22] Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus
    Liang, Kailin
    Liu, Bin
    Hu, Yifan
    Liu, Rui
    Bao, Feilong
    Gao, Guanglai
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [23] Hi-Fi Multi-Speaker English TTS Dataset
    Bakhturina, Evelina
    Lavrukhin, Vitaly
    Ginsburg, Boris
    Zhang, Yang
    [J]. INTERSPEECH 2021, 2021, : 2776 - 2780
  • [24] AISHELL-3: A Multi-Speaker Mandarin TTS Corpus
    Shi, Yao
    Bu, Hui
    Xu, Xin
    Zhang, Shaoji
    Li, Ming
    [J]. INTERSPEECH 2021, 2021, : 2756 - 2760
  • [25] Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
    Levkovitch, Alon
    Nachmani, Eliya
    Wolf, Lior
    [J]. INTERSPEECH 2022, 2022, : 2983 - 2987
  • [26] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    [J]. INTERSPEECH 2022, 2022, : 2968 - 2972
  • [27] TTS-Guided Training for Accent Conversion Without Parallel Data
    Zhou, Yi
    Wu, Zhizheng
    Zhang, Mingyang
    Tian, Xiaohai
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 533 - 537
  • [28] Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
    Valles-Perez, Ivan
    Roth, Julian
    Beringer, Grzegorz
    Barra-Chicote, Roberto
    Droppo, Jasha
    [J]. INTERSPEECH 2021, 2021, : 3131 - 3135
  • [29] Multi-accent Speech Separation with One Shot Learning
    Huang, Kuan Po
    Wu, Yuan-Kuei
    Lee, Hung-yi
    [J]. 1ST WORKSHOP ON META LEARNING AND ITS APPLICATIONS TO NATURAL LANGUAGE PROCESSING (METANLP 2021), 2021, : 59 - 66
  • [30] CAN WE USE COMMON VOICE TO TRAIN A MULTI-SPEAKER TTS SYSTEM?
    Ogun, Sewade
    Colotte, Vincent
    Vincent, Emmanuel
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 900 - 905