Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data

被引:0
|
作者
Li, Yanping [1 ,4 ]
Lee, Kong Aik [2 ]
Yuan, Yougen [3 ]
Li, Haizhou [4 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] NEC Corp Ltd, Data Sci Res Labs, Tokyo, Japan
[3] Northwestern Polytech Univ, Xian, Shaanxi, Peoples R China
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a novel approach to many-to-many (M2M) voice conversion for non-parallel training data. In the proposed approach, we first obtain bottleneck features (BNFs) as speaker representations from a deep neural network (DNN). Then, a variational autoencoder (VAE) implements the mapping function (i.e., a reconstruction process) using both the latent semantic information and the speaker representations. Furthermore, we propose an adaptive scheme by intervening the training process of the DNN, which can enrich the target speaker's personality feature space in the case of limited training data. Our approach has three advantages: 1) neither parallel training data nor explicit frame alignment process is required; 2) consolidates multiple pair-wise systems into a single M2M model (many-source speakers to many-target speakers); 3) expands M2M conversion task from closed set to open set when the training data of target speaker is very limited. The objective and subjective evaluations show that our proposed approach outperforms the baseline system.
引用
收藏
页码:829 / 833
页数:5
相关论文
共 50 条
  • [11] Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition
    Ding, Shaojin
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2020, 2020, : 776 - 780
  • [12] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM A TEXT-TO-SPEECH MODEL
    Yu, Xinyuan
    Mak, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5924 - 5928
  • [13] Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
    Zhao, Shengkui
    Nguyen, Trung Hieu
    Wang, Hao
    Ma, Bin
    INTERSPEECH 2019, 2019, : 689 - 693
  • [14] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    INTERSPEECH 2019, 2019, : 674 - 678
  • [15] STARGAN-VC: NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING STAR GENERATIVE ADVERSARIAL NETWORKS
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 266 - 273
  • [16] REMAP, WARP AND ATTEND: NON-PARALLEL MANY-TO-MANY ACCENT CONVERSION WITH NORMALIZING FLOWS
    Ezzerg, Abdelhamid
    Merritt, Thomas
    Yanagisawa, Kayoko
    Bilinski, Piotr
    Proszewska, Magdalena
    Pokora, Kamil
    Korzeniowski, Renard
    Barra-Chicote, Roberto
    Korzekwa, Daniel
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 984 - 990
  • [17] Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN
    Hashimoto, Tetsuya
    Saito, Daisuke
    Minematsu, Nobuaki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (02) : 332 - 341
  • [18] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [19] Many-to-many voice conversion with sentence embedding based on VAACGAN
    Li, Yanping
    Cao, Pan
    Shi, Yang
    Zhang, Yan
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 500 - 508
  • [20] Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Testuya
    Ariki, Yasuo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2749 - 2753