TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion

被引:0
|
作者
Chen, Ziyi [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Speech Acoust & Content Understanding, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
关键词
voice conversion; vector quantization; transformer; ctc;
D O I
10.21437/Interspeech.2021-1301
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Techniques of voice conversion(VC) aim to modify the speaker identity and style of an utterance while preserving the linguistic content. Although there are lots of VC methods, the state of the art of VC is still cascading automatic speech recognition(ASR) and text-to-speech(TTS). This paper presents a new structure of vector-quantized autoencoder based on transformer with CTC loss for non-parallel VC, which inspired by cascading ASR and TTS VC method. Our proposed method combines CTC loss and vector quantization to get high-level linguistic information without speaker information. Objective and subjective evaluations on the mandarin datasets show that the converted speech of our proposed model is better than baselines on naturalness, rhythm and speaker similarity.
引用
收藏
页码:826 / 830
页数:5
相关论文
共 50 条
  • [21] Vector Quantized Convolutional Autoencoder Network for LDCT Image Reconstruction with Hybrid Loss
    Ramanathan S.
    Ramasundaram M.
    SN Computer Science, 5 (1)
  • [22] Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting
    Li, Cheng
    Xu, Dan
    Chen, Kuai
    ELECTRONICS, 2024, 13 (10)
  • [23] Emotional Dialogue Generation Based on Transformer and Conditional Variational Autoencoder
    Lin, Hongquan
    Deng, Zhenrong
    2022 IEEE 21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS, IUCC/CIT/DSCI/SMARTCNS, 2022, : 386 - 393
  • [24] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [25] Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Luo, Hao
    Hwang, Hsin-Te
    Lo, Chen-Chou
    Peng, Yu-Huai
    Tsao, Yu
    Wang, Hsin-Min
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 468 - 479
  • [26] ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION
    Polyak, Adam
    Wolf, Lior
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6800 - 6804
  • [27] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [28] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [29] Voice Conversion Based on i-vector With Variational Autoencoding Relativistic Standard Generative Adversarial Network
    Li Y.-P.
    Cao P.
    Zuo Y.-T.
    Zhang Y.
    Qian B.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (07): : 1824 - 1833
  • [30] A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder
    Mohammadi, Seyed Hamidreza
    Kain, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1647 - 1651