END-TO-END ZERO-SHOT VOICE CONVERSION USING A DDSP VOCODER

被引:3
|
作者
Nercessian, Shahan [1 ]
机构
[1] iZotope Inc, 60 Hampshire St, Cambridge, MA 02139 USA
关键词
voice conversion; differential digital signal processing; end-to-end training; zero-shot learning;
D O I
10.1109/WASPAA52581.2021.9632754
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a zero-shot voice conversion algorithm using a neural vocoder based on differential digital signal processing. The vocoder does not require auto-regression, and its lightweight, differentiable nature allows the proposed system to be trained in an end-to-end fashion. This enables the use of more perceptually relevant objective functions for model training, and allows feature conversion and vocoder sub-networks to internally learn their own acoustic representation in a data-driven manner. We illustrate the effectiveness of the proposed algorithm by both qualitative and quantitative means, with comparisons to some of our previous works.
引用
收藏
页码:306 / 310
页数:5
相关论文
共 50 条
  • [1] Vocoder-free End-to-End Voice Conversion with Transformer Network
    Kim, June-Woo
    Jung, Ho-Young
    Lee, Minho
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
    Brattoli, Biagio
    Tighe, Joseph
    Zhdanov, Fedor
    Perona, Pietro
    Chalupka, Krzysztof
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4612 - 4622
  • [3] Indicative Vision Transformer for end-to-end zero-shot sketch-based image retrieval
    Zhang, Haoxiang
    Cheng, Deqiang
    Kou, Qiqi
    Asad, Mujtaba
    Jiang, He
    [J]. ADVANCED ENGINEERING INFORMATICS, 2024, 60
  • [4] End-to-End Voice Conversion with Information Perturbation
    Xie, Qicong
    Yang, Shan
    Lei, Yi
    Xie, Lei
    Su, Dan
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 91 - 95
  • [5] An end-to-end deep generative approach with meta-learning optimization for zero-shot object classification
    Xu, Xiaofeng
    Bao, Xianglin
    Lu, Xingyu
    Zhang, Ruiheng
    Chen, Xinquan
    Lu, Guifu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [6] Simple Is Better: A Global Semantic Consistency Based End-to-End Framework for Effective Zero-Shot Learning
    Wu, Fan
    Zhou, Shuigeng
    Wang, Kang
    Xu, Yi
    Guan, Jihong
    Huan, Jun
    [J]. PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 98 - 112
  • [7] Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals
    Nercessian, Shahan
    [J]. INTERSPEECH 2020, 2020, : 4711 - 4715
  • [8] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone
    Casanova, Edresson
    Weber, Julian
    Shulby, Christopher
    Candido Junior, Arnaldo
    Goelge, Eren
    Ponti, Moacir Antonelli
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [9] Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network
    Gul, Sania
    Khan, Muhammad Salman
    Ur-Rehman, Ata
    [J]. PLOS ONE, 2024, 19 (07):
  • [10] Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
    Lian, Jiachen
    Zhang, Chunlei
    Anumanchipalli, Gopala Krishna
    Yu, Dong
    [J]. INTERSPEECH 2022, 2022, : 2598 - 2602