Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning

被引:0
|
作者
Hu, Jinsen [1 ]
Yu, Chunyan [1 ]
Guan, Faqian [1 ]
机构
[1] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou, Peoples R China
关键词
D O I
10.1109/apsipaasc47483.2019.9023357
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rapid development of deep learning, although speech conversion had made great progress, there are still rare researches in deep learning to model on singing voice conversion, which is mainly based on statistical methods at present and can only achieve one-to-one conversion with parallel training datasets. So far, its application is limited This paper proposes a generative adversarial learning model, MSVC-GAN, for many-to-many singing voice conversion using non-parallel datasets. First, the generator of our model is concatenated by the singer label, which denotes domain constraint Furthermore, the model integrates self-attention mechanism to capture long-term dependence on the spectral features. Finally, switchable normalization is employed to stabilize network training. Both the objective and subjective evaluation results show that our model achieves the highest similarity and naturalness not only on the parallel speech dataset but also on the non-parallel singing dataset.
引用
收藏
页码:125 / 132
页数:8
相关论文
共 50 条
  • [41] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [42] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [43] Choosing only the best voice imitators: Top-K many-to-many voice conversion with StarGAN
    Fernandez-Martin, Claudio
    Colomer, Adrian
    Panariello, Claudio
    Naranjo, Valery
    [J]. SPEECH COMMUNICATION, 2024, 156
  • [44] Many-to-Many Relational Parallel Coordinates Displays
    Lind, Mats
    Johansson, Jimmy
    Cooper, Matthew
    [J]. INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 25 - +
  • [45] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    [J]. INTERSPEECH 2020, 2020, : 4716 - 4720
  • [46] Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network
    Zhou, Yi
    Tian, Xiaohai
    Das, Rohan Kumar
    Li, Haizhou
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1282 - 1287
  • [47] MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning
    Onishi, Kotaro
    Nakashika, Toru
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1438 - 1443
  • [48] Non-Parallel Voice Conversion for ASR Augmentation
    Wang, Gary
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Biadsy, Fadi
    Huang, Yinghui
    Emond, Jesse
    Mengibar, Pedro Moreno
    [J]. INTERSPEECH 2022, 2022, : 3408 - 3412
  • [49] Region Normalized Capsule Network Based Generative Adversarial Network for Non-parallel Voice Conversion
    Akhter, Md Tousin
    Banerjee, Padmanabha
    Dhar, Sandipan
    Ghosh, Subhayu
    Jana, Nanda Dulal
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 233 - 244
  • [50] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Tanaka, Kou
    Hojo, Nobukatsu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5919 - 5923