Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation

被引:11
|
作者
Zhou, Yi [1 ]
Tian, Xiaohai [2 ]
Li, Haizhou [1 ,3 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[2] Bytedance AI Lab, Speech & Audio Dept, Singapore 048583, Singapore
[3] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen 518172, Peoples R China
基金
新加坡国家研究基金会;
关键词
Task analysis; Speech processing; Decoding; Training; Speech enhancement; Speaker recognition; Encoding; Language agnostic; speaker embedding; cross-lingual; personalized speech generation; DEEP NEURAL-NETWORKS; VOICE CONVERSION;
D O I
10.1109/TASLP.2021.3125142
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-lingual personalized speech generation seeks to synthesize a target speaker's voice from only a few training samples that are in a different language. One popular technique is to condition a speech synthesizer on a speaker embedding, that characterizes the target speaker. Unfortunately, such a speaker embedding is usually affected by the language being spoken, which compromises the speaker similarity in cross-lingual personalized speech generation. In this paper, we propose a novel speaker encoding mechanism that learns a language agnostic speaker embedding to characterize speaker individuality. Specifically, we adopt an encoder-decoder architecture to disentangle the language information from speaker embeddings via multi-task learning. We conduct experiments on both voice conversion and text-to-speech synthesis between English and Mandarin that involve cross-lingual speech generation. All objective and subjective evaluations consistently confirm that the proposed speaker embedding is language agnostic, thus improving cross-lingual personalized speech generation in terms of speaker similarity.
引用
收藏
页码:3427 / 3439
页数:13
相关论文
共 50 条
  • [1] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2019, 2019, : 2105 - 2109
  • [2] DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
    Liu, Sen
    Guo, Yiwei
    Du, Chenpeng
    Chen, Xie
    Yu, Kai
    INTERSPEECH 2023, 2023, : 616 - 620
  • [3] Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations
    Rodriguez, Sebastian E.
    Allende-Cid, Hector
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 77 - 87
  • [4] LAPCA: Language-Agnostic Pretraining with Cross-Lingual Alignment
    Abulkhanov, Dmitry
    Sorokin, Nikita
    Nikolenko, Sergey
    Malykh, Valentin
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2098 - 2102
  • [5] Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
    Wester, Mirjam
    Liang, Hui
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2492 - 2495
  • [6] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [7] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Oura, Keiichiro
    Tokuda, Keiichi
    Yamagishi, Junichi
    King, Simon
    Wester, Mirjam
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
  • [8] Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
    Saffjoo, Seyyed Saeed
    Demiroglu, Cenk
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 317 - 321
  • [9] TACKLING THE SCORE SHIFT IN CROSS-LINGUAL SPEAKER VERIFICATION BY EXPLOITING LANGUAGE INFORMATION
    Thienpondt, Jenthe
    Desplanques, Brecht
    Demuynck, Kris
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7187 - 7191
  • [10] ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation
    Maurya, Kaushal Kumar
    Desarkar, Maunendra Sankar
    Kano, Yoshinobu
    Deepshikha, Kumari
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2804 - 2818