End-to-End Voice Conversion with Information Perturbation

被引:1
|
作者
Xie, Qicong [1 ]
Yang, Shan [2 ]
Lei, Yi [1 ]
Xie, Lei [1 ]
Su, Dan [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp, Xian, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
voice conversion; end-to-end; any-to-any;
D O I
10.1109/ISCSLP57327.2022.10037890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the prosody of the source speech to the target, we particularly introduce a speaker-related pitch encoder which can maintain the general pitch pattern of the source speaker while flexibly modifying the pitch intensity of the generated speech. Finally, one-shot voice conversion is set up through continuous speaker space modeling. Experimental results indicate that the proposed end-to-end approach significantly outperforms the state-of-the-art models in terms of intelligibility, naturalness, and speaker similarity.
引用
收藏
页码:91 / 95
页数:5
相关论文
共 50 条
  • [1] NVC-NET: END-TO-END ADVERSARIAL VOICE CONVERSION
    Nguyen, Bac
    Cardinaux, Fabien
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7012 - 7016
  • [2] Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
    Liu, Andy T.
    Hsu, Po-chun
    Lee, Hung-yi
    [J]. INTERSPEECH 2019, 2019, : 1108 - 1112
  • [3] Vocoder-free End-to-End Voice Conversion with Transformer Network
    Kim, June-Woo
    Jung, Ho-Young
    Lee, Minho
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] END-TO-END ZERO-SHOT VOICE CONVERSION USING A DDSP VOCODER
    Nercessian, Shahan
    [J]. 2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 306 - 310
  • [5] An embedded end-to-end voice assistant
    Lazzaroni, Luca
    Bellotti, Francesco
    Berta, Riccardo
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [6] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion
    Miao, Chenfeng
    Zhu, Qingying
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1650 - 1661
  • [7] CONVERSATIONAL END-TO-END TTS FOR VOICE AGENTS
    Guo, Haohan
    Zhang, Shaofei
    Soong, Frank K.
    He, Lei
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 403 - 409
  • [8] Speaker voice normalization for end-to-end speech translation
    Xue, Zhengshan
    Shi, Tingxun
    Zhang, Xiaolei
    Xiong, Deyi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [9] Voice End-to-End Encrypted for TETRA Radiocommunication System
    Buric, Marian
    [J]. PROCEEDINGS OF THE 2010 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2010, : 419 - 422
  • [10] UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching
    Hong, Zhiqing
    Cui, Chenye
    Huang, Rongjie
    Zhang, Lichao
    Liu, Jinglin
    He, Jinzheng
    Zhao, Zhou
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7569 - 7579