Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario

被引:0
|
作者
Weng, Shao-En [1 ]
Shuai, Hong-Han [1 ]
Cheng, Wen-Huang [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottleneck. Therefore, we propose a bottleneck-free strategy for speech disentanglement. To avoid synthesizing the general style of speech, we utilize framewise facial embedding. It applied adversarial learning with a multi-scale discriminator for the model to achieve better quality. In addition, the self-attention module is added to focus on content-related features for in-the-wild data. Quantitative experiments show that our method outperforms previous work.
引用
收藏
页码:13718 / 13726
页数:9
相关论文
共 25 条
  • [1] Zero-shot voice conversion based on feature disentanglement
    Guo, Na
    Wei, Jianguo
    Li, Yongwei
    Lu, Wenhuan
    Tao, Jianhua
    SPEECH COMMUNICATION, 2024, 165
  • [2] Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
    Sheng, Zheng-Yan
    Ai, Yang
    Chen, Yan-Nian
    Ling, Zhen-Hua
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8443 - 8452
  • [3] ROBUST DISENTANGLED VARIATIONAL SPEECH REPRESENTATION LEARNING FOR ZERO-SHOT VOICE CONVERSION
    Lian, Jiachen
    Zhang, Chunlei
    Yu, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6572 - 6576
  • [4] Stylometry for real-world expert coders: a zero-shot approach
    Gurioli, Andrea
    Gabbrielli, Maurizio
    Zacchiroli, Stefano
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [5] Stylometry for real-world expert coders: a zero-shot approach
    Gurioli, Andrea
    Gabbrielli, Maurizio
    Zacchiroli, Stefano
    PeerJ Computer Science, 2024, 10
  • [6] WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
    Rekimoto, Jun
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
  • [7] LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on Language Models
    Wang Z.
    Chen Y.
    Xie L.
    Tian Q.
    Wang Y.
    IEEE Signal Processing Letters, 2023, 30 : 1157 - 1161
  • [8] GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
    Zhang, Zining
    He, Bingsheng
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 791 - 795
  • [9] SLMGAN: EXPLOITING SPEECH LANGUAGE MODEL REPRESENTATIONS FOR UNSUPERVISED ZERO-SHOT VOICE CONVERSION IN GANS
    Li, Yinghao Aaron
    Han, Cong
    Mesgarani, Nima
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [10] DualSR: Zero-Shot Dual Learning for Real-World Super-Resolution
    Emad, Mohammad
    Peemen, Maurice
    Corporaal, Henk
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1629 - 1638