Zero-shot multi-speaker accent TTS with limited accent data

被引:0
|
作者
Zhang, Mingyang [1 ]
Zhou, Yi [2 ]
Wu, Zhizheng [1 ]
Li, Haizhou [1 ,2 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Sch Data Sci, Shenzhen, Peoples R China
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
SPEAKER ADAPTATION;
D O I
10.1109/APSIPAASC58517.2023.10317526
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a multi-speaker accent speech synthesis framework. It can generate accented speech of unseen speakers using only a limited amount of accent training data. Without relying on the accent lexicon, the proposed network is able to learn the accent phoneme embedding via a simple model adaptation. In specific, a standard multi-speaker speech synthesis is first trained with native speech. Then, an additional neural network module is appended for adaptation to map the native speech to the accented speech. In the experiments, we have synthesized English speech with Singapore and Hindi accents. Both objective and subjective evaluation results successfully confirm that our proposed technique with phoneme mapping is effective to generate high-quality accent speech for unseen speakers.
引用
收藏
页码:1931 / 1936
页数:6
相关论文
共 50 条
  • [41] Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis
    Cory, Tristin
    Iqbal, Razib
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2022, : 247 - 248
  • [42] Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation
    Xu, Xun
    Hospedales, Timothy M.
    Gong, Shaogang
    [J]. COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 : 343 - 359
  • [43] ZERO-SHOT CODE-SWITCHING ASR AND TTS WITH MULTILINGUAL MACHINE SPEECH CHAIN
    Nakayama, Sahoko
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 964 - 971
  • [44] Learn more from less: Generalized zero-shot learning with severely limited labeled data
    Lu, Ziqian
    Lu, Zheming
    Yu, Yunlong
    Wang, Zonghui
    [J]. NEUROCOMPUTING, 2022, 477 : 25 - 35
  • [45] Transductive Multi-View Zero-Shot Learning
    Fu, Yanwei
    Hospedales, Timothy M.
    Xiang, Tao
    Gong, Shaogang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (11) : 2332 - 2345
  • [46] Generative Multi-Label Zero-Shot Learning
    Gupta, Akshita
    Narayan, Sanath
    Khan, Salman
    Khan, Fahad Shahbaz
    Shao, Ling
    van de Weijer, Joost
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14611 - 14624
  • [47] Multi-speaker TTS system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 849 - 853
  • [48] DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION
    Xiao, Ruitong
    Zhang, Haitong
    Lin, Yue
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6547 - 6551
  • [49] ZERO-SHOT VOICE CONVERSION WITH ADJUSTED SPEAKER EMBEDDINGS AND SIMPLE ACOUSTIC FEATURES
    Tan, Zhiyuan
    Wei, Jianguo
    Xu, Junhai
    He, Yuqing
    Lu, Wenhuan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5964 - 5968
  • [50] ZERO-SHOT CROSS-LINGUAL TRANSFER USING MULTI-STREAM ENCODER AND EFFICIENT SPEAKER REPRESENTATION
    Zheng, Yibin
    Zhang, Zewang
    Li, Xinhui
    Su, Wenchao
    Lu, Li
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8027 - 8031