Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data

被引:0
|
作者
Li, Yanping [1 ,4 ]
Lee, Kong Aik [2 ]
Yuan, Yougen [3 ]
Li, Haizhou [4 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] NEC Corp Ltd, Data Sci Res Labs, Tokyo, Japan
[3] Northwestern Polytech Univ, Xian, Shaanxi, Peoples R China
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a novel approach to many-to-many (M2M) voice conversion for non-parallel training data. In the proposed approach, we first obtain bottleneck features (BNFs) as speaker representations from a deep neural network (DNN). Then, a variational autoencoder (VAE) implements the mapping function (i.e., a reconstruction process) using both the latent semantic information and the speaker representations. Furthermore, we propose an adaptive scheme by intervening the training process of the DNN, which can enrich the target speaker's personality feature space in the case of limited training data. Our approach has three advantages: 1) neither parallel training data nor explicit frame alignment process is required; 2) consolidates multiple pair-wise systems into a single M2M model (many-source speakers to many-target speakers); 3) expands M2M conversion task from closed set to open set when the training data of target speaker is very limited. The objective and subjective evaluations show that our proposed approach outperforms the baseline system.
引用
收藏
页码:829 / 833
页数:5
相关论文
共 50 条
  • [1] NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4822 - 4825
  • [2] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [3] F0-CONSISTENT MANY-TO-MANY NON-PARALLEL VOICE CONVERSION VIA CONDITIONAL AUTOENCODER
    Qian, Kaizhi
    Fin, Zeyu
    Hasegawa-Johnson, Mark
    Mysore, Gautham J.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6284 - 6288
  • [4] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [5] Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
    Luang, Manh
    Viet Anh Tran
    INTERSPEECH 2021, 2021, : 851 - 855
  • [6] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING LOCAL LINGUISTIC TOKENS
    Wang, Chao
    Yu, Yibiao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5929 - 5933
  • [7] A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion
    Alaa, Yasmin
    Alfonse, Marco
    Aref, Mostafa M.
    5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 221 - 226
  • [8] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification
    Saito, Yuki
    Nakamura, Taiki
    Ijima, Yusuke
    Nishida, Kyosuke
    Takamichi, Shinnosuke
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (01) : 1 - 11
  • [9] TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS
    Merritt, Thomas
    Ezzerg, Abdelhamid
    Bilinski, Piotr
    Proszewska, Magdalena
    Pokora, Kamil
    Barra-Chicote, Roberto
    Korzekwa, Daniel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6782 - 6786
  • [10] Parallel-data-free Many-to-many Voice Conversion based on DNN Integrated with Eigenspace Using a Non-parallel Speech Corpus
    Hashimoto, Tetsuya
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1278 - 1282