Zero-shot multi-speaker accent TTS with limited accent data

被引:0
|
作者
Zhang, Mingyang [1 ]
Zhou, Yi [2 ]
Wu, Zhizheng [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, Shenzhen, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
SPEAKER ADAPTATION;
D O I
10.1109/APSIPAASC58517.2023.10317526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a multi-speaker accent speech synthesis framework. It can generate accented speech of unseen speakers using only a limited amount of accent training data. Without relying on the accent lexicon, the proposed network is able to learn the accent phoneme embedding via a simple model adaptation. In specific, a standard multi-speaker speech synthesis is first trained with native speech. Then, an additional neural network module is appended for adaptation to map the native speech to the accented speech. In the experiments, we have synthesized English speech with Singapore and Hindi accents. Both objective and subjective evaluation results successfully confirm that our proposed technique with phoneme mapping is effective to generate high-quality accent speech for unseen speakers.
引用
收藏
页码:1931 / 1936
页数:6
相关论文
共 50 条
  • [1] Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis
    Zhang, Mingyang
    Zhou, Xuehao
    Wu, Zhizheng
    Li, Haizhou
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 947 - 951
  • [2] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
    Jeon, Yejin
    Kim, Yunsu
    Lee, Gary Geunbae
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
  • [3] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone
    Casanova, Edresson
    Weber, Julian
    Shulby, Christopher
    Candido Junior, Arnaldo
    Goelge, Eren
    Ponti, Moacir Antonelli
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
    Kumar, Neeraj
    Goel, Srishti
    Narang, Ankur
    Lall, Brejesh
    [J]. INTERSPEECH 2021, 2021, : 1354 - 1358
  • [5] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
    Kumar, Neeraj
    Narang, Ankur
    Lall, Brejesh
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
  • [6] Zero-Shot Foreign Accent Conversion without a Native Reference
    Quamer, Waris
    Das, Anurag
    Levis, John
    Chukharev-Hudilainen, Evgeny
    Gutierrez-Osuna, Ricardo
    [J]. INTERSPEECH 2022, 2022, : 4920 - 4924
  • [7] Zero-Shot vs. Few-Shot Multi-speaker TTS Using Pre-trained Czech SpeechT5 Model
    Lehecka, Jan
    Hanzlicek, Zdenek
    Matousek, Jindrich
    Tihelka, Daniel
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 46 - 57
  • [8] ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH WITH STATE-OF-THE-ART NEURAL SPEAKER EMBEDDINGS
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Fang, Fuming
    Wang, Xin
    Chen, Nanxin
    Yamagishi, Junichi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6184 - 6188
  • [9] Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
    Choi, Byoung Jin
    Jeong, Myeonghun
    Kim, Minchan
    Mun, Sung Hwan
    Kim, Nam Soo
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1708 - 1712
  • [10] Effective Zero-Shot Multi-Speaker Text-to-Speech Technique Using Information Perturbation and a Speaker Encoder
    Bang, Chae-Woon
    Chun, Chanjun
    [J]. SENSORS, 2023, 23 (23)