Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

被引:105
|
作者
Hsu, Chin-Cheng [1 ]
Hwang, Hsin-Te [1 ]
Wu, Yi-Chiao [1 ]
Tsao, Yu [2 ]
Wang, Hsin-Min [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
关键词
D O I
10.1109/APSIPA.2016.7820786
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    [J]. PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [2] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. INTERSPEECH 2019, 2019, : 674 - 678
  • [3] Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder
    Nakatani, Hikaru
    Tobing, Patrick Lumban
    Takeda, Kazuya
    Toda, Tomoki
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 520 - 526
  • [4] Non-Parallel Voice Conversion System Using An Auto-Regressive Model
    Ezzine, Kadria
    Frikha, Mondher
    Di Martino, Joseph
    [J]. PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 500 - 504
  • [5] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [6] Hamiltonian Variational Auto-Encoder
    Caterini, Anthony L.
    Doucet, Arnaud
    Sejdinovic, Dino
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] COMPOUND VARIATIONAL AUTO-ENCODER
    Su, Shang-Yu
    Lin, Shan-Wei
    Chen, Yun-Nung
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3577 - 3581
  • [8] Any-to-any voice conversion using representation separation auto-encoder
    Jian, Zhihua
    Zhang, Zixu
    [J]. Tongxin Xuebao/Journal on Communications, 2024, 45 (02): : 162 - 172
  • [9] Fast Model Alignment for Structured Statistical Approach of Non-parallel Corpora Voice Conversion
    Che, Yingxia
    Yu, Yibiao
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 88 - 92
  • [10] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640