TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion

被引:0
|
作者
Chen, Ziyi [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
voice conversion; vector quantization; transformer; ctc;
D O I
10.21437/Interspeech.2021-1301
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Techniques of voice conversion(VC) aim to modify the speaker identity and style of an utterance while preserving the linguistic content. Although there are lots of VC methods, the state of the art of VC is still cascading automatic speech recognition(ASR) and text-to-speech(TTS). This paper presents a new structure of vector-quantized autoencoder based on transformer with CTC loss for non-parallel VC, which inspired by cascading ASR and TTS VC method. Our proposed method combines CTC loss and vector quantization to get high-level linguistic information without speaker information. Objective and subjective evaluations on the mandarin datasets show that the converted speech of our proposed model is better than baselines on naturalness, rhythm and speaker similarity.
引用
收藏
页码:826 / 830
页数:5
相关论文
共 50 条
  • [1] Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion
    Kang, Xiao
    Huang, Hao
    Hu, Ying
    Huang, Zhihua
    DIGITAL SIGNAL PROCESSING, 2021, 116
  • [2] CRANK: AN OPEN-SOURCE SOFTWARE FOR NONPARALLEL VOICE CONVERSION BASED ON VECTOR-QUANTIZED VARIATIONAL AUTOENCODER
    Kobayashi, Kazuhiro
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5934 - 5938
  • [3] Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 724 - 728
  • [4] Vector-Quantized Variational AutoEncoder for pansharpening
    Talbi, Farid
    Elmezouar, Miloud Chikr
    Boutellaa, Elhocine
    Alim, Fatiha
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (20) : 6329 - 6349
  • [5] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [6] Predictive Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [7] Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion
    Akuzawa, Kei
    Onishi, Kotaro
    Takiguchi, Keisuke
    Mametani, Kohki
    Mori, Koichiro
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 808 - 813
  • [8] Data augmentation for Gram-stain images based on Vector Quantized Variational AutoEncoder
    Shwetha, V
    Prasad, Keerthana
    Mukhopadhyay, Chiranjay
    Banerjee, Barnini
    NEUROCOMPUTING, 2024, 600
  • [9] The Multilayer Perceptron Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 348 - 353
  • [10] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    INTERSPEECH 2019, 2019, : 674 - 678