Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine

被引:51
|
作者
Nakashika, Toru [1 ]
Takiguchi, Tetsuya [2 ]
Minami, Yasuhiro [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, Tokyo 1828585, Japan
[2] Kobe Univ, Org Adv Sci & Technol, Kobe, Hyogo 6578501, Japan
关键词
Restricted Boltzmann machine; speaker adaptation; unsupervised training; voice conversion; NEURAL-NETWORKS; TRANSFORMATION; SPARSE;
D O I
10.1109/TASLP.2016.2593263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. VC is a technique where only speaker-specific information in source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: 1) the data used for the training are limited to the predefined sentences, 2) the trained model is only applied to the speaker pair used in the training, and 3) mismatches in alignment may occur. Although it is, thus, fairly preferable in VC not to use parallel data, a nonparallel approach is considered difficult to learn. In our approach, we achieve nonparallel training based on a speaker adaptation technique and capturing latent phonological information. This approach assumes that speech signals are produced from a restricted Boltzmann machine-based probabilistic model, where phonological information and speaker-related information are defined explicitly. Speaker-independent and speaker-dependent parameters are simultaneously trained under speaker adaptive training. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by mixing the two. Our experimental results showed that our approach outperformed another nonparallel approach, and produced results similar to those of the popular conventional Gaussian mixture models-based method that used parallel data in subjective and objective criteria.
引用
收藏
页码:2032 / 2045
页数:14
相关论文
共 50 条
  • [41] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [42] NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4822 - 4825
  • [43] Voice Conversion for Non-Parallel Datasets Using Dynamic Kernel Partial Least Squares Regression
    Silen, Hanna
    Nurminen, Jani
    Helander, Elina
    Gabbouj, Moncef
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 373 - 377
  • [44] Any-to-One Non-Parallel Voice Conversion System Using an Autoregressive Conversion Model and LPCNet Vocoder
    Ezzine, Kadria
    Di Martino, Joseph
    Frikha, Mondher
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [45] NON-PARALLEL VOICE CONVERSION USING JOINT OPTIMIZATION OF ALIGNMENT BY TEMPORAL CONTEXT AND SPECTRAL DISTORTION
    Benisty, H.
    Malah, D.
    Crammer, K.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [47] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    [J]. INTERSPEECH 2020, 2020, : 781 - 785
  • [48] Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Toda, Tomoki
    [J]. IEEE ACCESS, 2020, 8 : 62094 - 62106
  • [49] Mapping Frames with DNN-HMM Recognizer for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Lu, Yanfeng
    Ehnes, Jochen Walter
    Huang, Dongyan
    Ming, Huaiping
    Tong, Rong
    Lee, Siu Wa
    Li, Haizhou
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 488 - 494
  • [50] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132