Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion

被引:1
|
作者
Nakashika, Toru [1 ]
Minami, Yasuhiro [1 ]
机构
[1] Univ Electrocommun, Grad Sch Informat Syst, 1-5-1 Chofugaoka, Chofu, Tokyo 1828585, Japan
关键词
Voice conversion; Boltzmann machine; Unsupervised training; Energy-based model; Speaker adaptation; Non-parallel training; SAT; TRANSFORMATION;
D O I
10.1186/s13636-017-0112-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data-pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
    Toru Nakashika
    Yasuhiro Minami
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [2] SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Nakashika, Torsi
    Minami, Yasuhiro
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5530 - 5534
  • [3] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [4] Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine
    Kishida, Takuya
    Nakashika, Toru
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 251 - 255
  • [5] BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH
    Luong, Hieu-Thi
    Yamagishi, Junichi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 200 - 207
  • [6] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    [J]. INTERSPEECH 2019, 2019, : 201 - 205
  • [7] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [8] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    [J]. INTERSPEECH 2021, 2021, : 1369 - 1373
  • [9] Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition
    Ding, Shaojin
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    [J]. INTERSPEECH 2020, 2020, : 776 - 780
  • [10] Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 540 - 552