Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning

被引:0
|
作者
Hu, Jinsen [1 ]
Yu, Chunyan [1 ]
Guan, Faqian [1 ]
机构
[1] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou, Peoples R China
关键词
D O I
10.1109/apsipaasc47483.2019.9023357
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rapid development of deep learning, although speech conversion had made great progress, there are still rare researches in deep learning to model on singing voice conversion, which is mainly based on statistical methods at present and can only achieve one-to-one conversion with parallel training datasets. So far, its application is limited This paper proposes a generative adversarial learning model, MSVC-GAN, for many-to-many singing voice conversion using non-parallel datasets. First, the generator of our model is concatenated by the singer label, which denotes domain constraint Furthermore, the model integrates self-attention mechanism to capture long-term dependence on the spectral features. Finally, switchable normalization is employed to stabilize network training. Both the objective and subjective evaluation results show that our model achieves the highest similarity and naturalness not only on the parallel speech dataset but also on the non-parallel singing dataset.
引用
收藏
页码:125 / 132
页数:8
相关论文
共 50 条
  • [1] Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
    Zhao, Shengkui
    Nguyen, Trung Hieu
    Wang, Hao
    Ma, Bin
    [J]. INTERSPEECH 2019, 2019, : 689 - 693
  • [2] A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion
    Alaa, Yasmin
    Alfonse, Marco
    Aref, Mostafa M.
    [J]. 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 221 - 226
  • [3] Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition
    Ding, Shaojin
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    [J]. INTERSPEECH 2020, 2020, : 776 - 780
  • [4] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    [J]. INTERSPEECH 2020, 2020, : 781 - 785
  • [5] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING LOCAL LINGUISTIC TOKENS
    Wang, Chao
    Yu, Yibiao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5929 - 5933
  • [6] NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4822 - 4825
  • [7] STARGAN-VC: NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING STAR GENERATIVE ADVERSARIAL NETWORKS
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 266 - 273
  • [8] TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS
    Merritt, Thomas
    Ezzerg, Abdelhamid
    Bilinski, Piotr
    Proszewska, Magdalena
    Pokora, Kamil
    Barra-Chicote, Roberto
    Korzekwa, Daniel
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6782 - 6786
  • [9] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [10] F0-CONSISTENT MANY-TO-MANY NON-PARALLEL VOICE CONVERSION VIA CONDITIONAL AUTOENCODER
    Qian, Kaizhi
    Fin, Zeyu
    Hasegawa-Johnson, Mark
    Mysore, Gautham J.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6284 - 6288