Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data

被引:0
|
作者
Li, Yanping [1 ,4 ]
Lee, Kong Aik [2 ]
Yuan, Yougen [3 ]
Li, Haizhou [4 ]
Yang, Zhen [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China
[2] NEC Corp Ltd, Data Sci Res Labs, Tokyo, Japan
[3] Northwestern Polytech Univ, Xian, Shaanxi, Peoples R China
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a novel approach to many-to-many (M2M) voice conversion for non-parallel training data. In the proposed approach, we first obtain bottleneck features (BNFs) as speaker representations from a deep neural network (DNN). Then, a variational autoencoder (VAE) implements the mapping function (i.e., a reconstruction process) using both the latent semantic information and the speaker representations. Furthermore, we propose an adaptive scheme by intervening the training process of the DNN, which can enrich the target speaker's personality feature space in the case of limited training data. Our approach has three advantages: 1) neither parallel training data nor explicit frame alignment process is required; 2) consolidates multiple pair-wise systems into a single M2M model (many-source speakers to many-target speakers); 3) expands M2M conversion task from closed set to open set when the training data of target speaker is very limited. The objective and subjective evaluations show that our proposed approach outperforms the baseline system.
引用
收藏
页码:829 / 833
页数:5
相关论文
共 50 条
  • [41] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [42] Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder
    Hsu, Chin-Cheng
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Tsao, Yu
    Wang, Hsin-Min
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [43] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [44] Data augmentation based non-parallel voice conversion with frame-level speaker disentangler
    Chen, Bo
    Xu, Zhihang
    Yu, Kai
    SPEECH COMMUNICATION, 2022, 136 : 14 - 22
  • [45] Non-Parallel Whisper-to-Normal Speaking Style Conversion Using Auxiliary Classifier Variational Autoencoder
    Seki, Shogo
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    IEEE ACCESS, 2023, 11 : 44590 - 44599
  • [46] Non-parallel training for voice conversion using background-based alignment of GMMs and INCA algorithm
    Ghorbandoost, Mostafa
    Saba, Valiallah
    IET SIGNAL PROCESSING, 2017, 11 (08) : 998 - 1005
  • [47] Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
    Shah, Nirmesh J.
    Madhavi, Maulik C.
    Patil, Hemant A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1968 - 1972
  • [48] Investigation of Text-to-Speech-based Synthetic Parallel Data for Sequence-to-Sequence Non-Parallel Voice Conversion
    Ma, Ding
    Huang, Wen-Chin
    Toda, Tomoki
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 870 - 877
  • [49] Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 771 - 775
  • [50] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648