GENERATING MULTILINGUAL VOICES USING SPEAKER SPACE TRANSLATION BASED ON BILINGUAL SPEAKER DATA

被引:0
|
作者
Maiti, Soumi [1 ,2 ]
Marchi, Erik [1 ]
Conkie, Alistair [1 ]
机构
[1] Apple, Cupertino, CA USA
[2] CUNY, Grad Ctr, New York, NY 10021 USA
关键词
cross-lingual transfer; d-vector; speaker space manipulation; bilingual speaker; text-to-speech synthesis;
D O I
10.1109/icassp40776.2020.9054305
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. We demonstrate that a bilingual speaker embedding space contains a separate distribution for each language and that a simple transform in speaker space generated by the speaker embedding can be used to control the degree of accent of a synthetic voice in a language. The same transform can be applied even to monolingual speakers. In our experiments speaker data from an English-Spanish (Mexican) bilingual speaker was used, and the goal was to enable English speakers to speak Spanish and Spanish speakers to speak English. We found that the simple transform was sufficient to convert a voice from one language to the other with a high degree of naturalness. In one case the transformed voice outperformed a native language voice in listening tests. Experiments further indicated that the transform preserved many of the characteristics of the original voice. The degree of accent present can be controlled and naturalness is relatively consistent across a range of accent values.
引用
收藏
页码:7624 / 7628
页数:5
相关论文
共 50 条
  • [1] Multilingual speaker recognition using ANFIS
    Department of Information Technology, ABV-Indian Institute of Information Technology and Management, Gwalior, India
    [J]. ICSPS - Proc. Int. Conf. Signal Process. Syst., 1600, (V3714-V3718):
  • [2] Fast Speaker Idntification Based on Speaker Metric Space
    Feng Yong
    Guo Jichuan
    Cao Junhua
    Zhu Lei
    [J]. 2015 IEEE ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2015, : 1167 - 1171
  • [3] Speaker identification using multilingual phone strings
    Jin, Q
    Schultz, T
    Waibel, A
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 145 - 148
  • [4] Speaker adaptation for telephony data using speaker clustering
    Wu, C
    Lubensky, D
    Wang, ZH
    [J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 768 - 771
  • [5] Tensor-based Speaker Space Construction for Arbitrary Speaker Conversion
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 595 - 598
  • [6] GMM-UBM based speaker verification in multilingual environments
    Bhattacharjee, Utpal
    Sarmah, Kshirod
    [J]. International Journal of Computer Science Issues, 2012, 9 (6 6-2): : 373 - 380
  • [7] Bilingual Speech Recognition by Estimating Speaker Geometry from Video Data
    Tapia, Luis Sanchez
    Gomez, Antonio
    Esparza, Mario
    Jatla, Venkatesh
    Pattichis, Marios
    Celedon-Pattichis, Sylvia
    Leiva, Carlos Lopez
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2021, PT 1, 2021, 13052 : 79 - 89
  • [8] SPEAKER CHARACTERIZATION USING TDNN-LSTM BASED SPEAKER EMBEDDING
    Chen, Chia-Ping
    Zhang, Su-Yu
    Yeh, Chih-Ting
    Wang, Jia-Ching
    Wang, Tenghui
    Huang, Chien-Lin
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6211 - 6215
  • [9] Arbitrary speaker conversion based on speaker space bases constructed by deep neural networks
    Hashimoto, Tetsuya
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [10] Stream-based speaker segmentation using speaker factors and eigenvoices
    Castaldo, Fabio
    Colibro, Daniele
    Dalmasso, Emanuele
    Laface, Pietro
    Vair, Claudio
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4133 - +