Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario

被引：0

作者：

Weng, Shao-En ^{[1
]}

Shuai, Hong-Han ^{[1
]}

Cheng, Wen-Huang ^{[1
]}

机构：

[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottleneck. Therefore, we propose a bottleneck-free strategy for speech disentanglement. To avoid synthesizing the general style of speech, we utilize framewise facial embedding. It applied adversarial learning with a multi-scale discriminator for the model to achieve better quality. In addition, the self-attention module is added to focus on content-related features for in-the-wild data. Quantitative experiments show that our method outperforms previous work.

引用

页码：13718 / 13726

页数：9

共 25 条

[21] Electric car life cycle assessment based on real-world mileage and the electric conversion scenario
Helmers, Eckard
Dietz, Johannes
Hartard, Susanne
INTERNATIONAL JOURNAL OF LIFE CYCLE ASSESSMENT, 2017, 22 (01): : 15 - 30
[22] Electric car life cycle assessment based on real-world mileage and the electric conversion scenario
Eckard Helmers
Johannes Dietz
Susanne Hartard
The International Journal of Life Cycle Assessment, 2017, 22 : 15 - 30
[23] LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Chen, Shihao
Gu, Yu
Zhang, Jie
Li, Na
Chen, Rilin
Chen, Liping
Dai, Lirong
INTERSPEECH 2024, 2024, : 2770 - 2774
[24] Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Choi, Ha-Yeong
Lee, Sang-Hoon
Lee, Seong-Whan
INTERSPEECH 2023, 2023, : 2283 - 2287
[25] GERRS: Removing Ghost Effects from Real-World Scenarios in 3D Pose estimation via Zero-shot Inference Approach
Hossain, Md Imtiaz
Akhter, Sharmen
Mahbub, Md Nosin Lbna
Hossain, Md Delowar
Yang, Sungjun
Huh, Eui-Nam
2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1177 - 1183

← 1 2 3 →