TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion

被引:0
|
作者
Chen, Ziyi [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
voice conversion; vector quantization; transformer; ctc;
D O I
10.21437/Interspeech.2021-1301
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Techniques of voice conversion(VC) aim to modify the speaker identity and style of an utterance while preserving the linguistic content. Although there are lots of VC methods, the state of the art of VC is still cascading automatic speech recognition(ASR) and text-to-speech(TTS). This paper presents a new structure of vector-quantized autoencoder based on transformer with CTC loss for non-parallel VC, which inspired by cascading ASR and TTS VC method. Our proposed method combines CTC loss and vector quantization to get high-level linguistic information without speaker information. Objective and subjective evaluations on the mandarin datasets show that the converted speech of our proposed model is better than baselines on naturalness, rhythm and speaker similarity.
引用
收藏
页码:826 / 830
页数:5
相关论文
共 50 条
  • [31] Low-Dose CT Image Reconstruction using Vector Quantized Convolutional Autoencoder with Perceptual Loss
    Ramanathan, Shalini
    Ramasundaram, Mohan
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2023, 48 (02):
  • [32] Low-Dose CT Image Reconstruction using Vector Quantized Convolutional Autoencoder with Perceptual Loss
    Shalini Ramanathan
    Mohan Ramasundaram
    Sādhanā, 48
  • [33] Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
    Quoc-Huy Nguyen
    Unoki, Masashi
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 21 - 25
  • [34] WEAKLY SUPERVISED MARINE ANIMAL DETECTION FROM REMOTE SENSING IMAGES USING VECTOR-QUANTIZED VARIATIONAL AUTOENCODER
    Pham, Minh-Tan
    Gangloff, Hugo
    Lefevre, Sebastien
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5559 - 5562
  • [35] Adaptive Transformer-Based Conditioned Variational Autoencoder for Incomplete Social Event Classification
    Li, Zhangming
    Qian, Shengsheng
    Cao, Jie
    Fang, Quan
    Xu, Changsheng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1698 - 1707
  • [36] T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion
    Wang, Tianming
    Wan, Xiaojun
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5233 - 5239
  • [37] MULTI-SPEAKER AND MULTI-DOMAIN EMOTIONAL VOICE CONVERSION USING FACTORIZED HIERARCHICAL VARIATIONAL AUTOENCODER
    Elgaar, Mohamed
    Park, Jungbae
    Lee, Sang Wan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7769 - 7773
  • [38] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [39] Unsupervised Anomaly Detection in Multivariate Time Series through Transformer-based Variational Autoencoder
    Zhang, Hongwei
    Xia, Yuanqing
    Yan, Tijin
    Liu, Guiyang
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 281 - 286
  • [40] Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting
    Fu, Fengyi
    Fang, Shancheng
    Chen, Weidong
    Mao, Zhendong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)