TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion

被引:0
|
作者
Chen, Ziyi [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
voice conversion; vector quantization; transformer; ctc;
D O I
10.21437/Interspeech.2021-1301
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Techniques of voice conversion(VC) aim to modify the speaker identity and style of an utterance while preserving the linguistic content. Although there are lots of VC methods, the state of the art of VC is still cascading automatic speech recognition(ASR) and text-to-speech(TTS). This paper presents a new structure of vector-quantized autoencoder based on transformer with CTC loss for non-parallel VC, which inspired by cascading ASR and TTS VC method. Our proposed method combines CTC loss and vector quantization to get high-level linguistic information without speaker information. Objective and subjective evaluations on the mandarin datasets show that the converted speech of our proposed model is better than baselines on naturalness, rhythm and speaker similarity.
引用
收藏
页码:826 / 830
页数:5
相关论文
共 50 条
  • [41] Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network
    Ho, Tuan Vu
    Akagi, Masato
    IEEE ACCESS, 2021, 9 : 47503 - 47515
  • [42] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [43] A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    King, Simon
    Tokuda, Keiichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (157-170) : 157 - 170
  • [44] Nontechnical Loss Detection Based on Stacked Uncorrelating Autoencoder and Support Vector Machine
    Hu T.
    Guo Q.
    Sun H.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2019, 43 (01): : 119 - 125
  • [45] A Transformer-Based Hierarchical Variational AutoEncoder Combined Hidden Markov Model for Long Text Generation
    Zhao, Kun
    Ding, Hongwei
    Ye, Kai
    Cui, Xiaohui
    ENTROPY, 2021, 23 (10)
  • [46] DATA AUGMENTATION FOR MONAURAL SINGING VOICE SEPARATION BASED ON VARIATIONAL AUTOENCODER-GENERATIVE ADVERSARIAL NETWORK
    He, Boxin
    Wang, Shengbei
    Yuan, Weitao
    Wang, Jianming
    Unoki, Masashi
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1354 - 1359
  • [47] Variational Autoencoder-Based Multiobjective Topology Optimization of Electrical Machines Using Vector Graphics
    Heroth, Michael
    Schmid, Helmut C.
    Gregorova, Magda
    Herrler, Rainer
    Hofmann, Wilfried
    IEEE Access, 2024, 12 : 184813 - 184826
  • [48] Vector -Quantization Variational Autoencoder Based Data Rate Reduction for Wireless Ultrasound Imaging Systems
    Bastola, Sulav
    Tekes, Coskun
    SOUTHEASTCON 2024, 2024, : 1426 - 1431
  • [49] VQ-DcTr: Vector Quantized Autoencoder With Dual-channel Transformer Points Splitting for 3D Point Cloud Completion
    Fei, Ben
    Yang, Weidong
    Chen, Wen-Ming
    Ma, Lipeng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4769 - 4778
  • [50] Speaker-independent HMM-based Voice Conversion Using Quantized Fundamental Frequency
    Nose, Takashi
    Kobayashi, Takao
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1724 - 1727