Hear Your Face: Face-based voice conversion with F0 estimation

被引：0

作者：

Lee, Jaejun ^{[1
]}

Oh, Yoori ^{[1
]}

Hwang, Injune ^{[1
]}

Lee, Kyogu ^{[1
,2
,3
]}

机构：

[1] Seoul Natl Univ, Dept Intelligence & Informat, Seoul, South Korea

[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea

[3] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea

来源：

INTERSPEECH 2024 | 2024年

关键词：

voice conversion; face/voice association; cross modal generation; speaker embedding; IDENTITY;

D O I：

10.21437/Interspeech.2024-232

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper delves into the emerging field of face-based voice conversion, leveraging the unique relationship between an individual's facial features and their vocal characteristics. We present a novel face-based voice conversion framework that particularly utilizes the average fundamental frequency of the target speaker, derived solely from their facial images. Through extensive analysis, our framework demonstrates superior speech generation quality and the ability to align facial features with voice characteristics, including tracking of the target speaker's fundamental frequency.

引用

页码：4378 / 4382

页数：5

共 50 条

[21] A face-based computer login system
King, S
Tian, GY
Ward, S
ADVANCES IN E-ENGINEERING AND DIGITAL ENTERPRISE TECHNOLOGY-I, PROCEEDINGS, 2004, : 523 - 533
[22] Is the mere exposure effect in face attractiveness image-based or face-based?
Cullen, B.
Newell, F.
PERCEPTION, 2013, 42 : 203 - 203
[23] Face and voice perception: Monkey see, monkey hear
Beauchamp, Michael S.
CURRENT BIOLOGY, 2021, 31 (09) : R435 - R437
[24] The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion
Chen, Ling-Hui
Liu, Li-Juan
Ling, Zhen-Hua
Jiang, Yuan
Dai, Li-Rong
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1642 - 1646
[25] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
Zhaojie Luo
Jinhui Chen
Tetsuya Takiguchi
Yasuo Ariki
EURASIP Journal on Audio, Speech, and Music Processing, 2017
[26] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
Luo, Zhaojie
Chen, Jinhui
Takiguchi, Tetsuya
Ariki, Yasuo
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
[27] Cross-lingual voice conversion based on F0 multi-scale modeling with VITS
Cao, Danyang
Zhang, Zeyi
PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 375 - 379
[28] Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
Weng, Shao-En
Shuai, Hong-Han
Cheng, Wen-Huang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13718 - 13726
[29] Improving Face-Based Age Estimation With Attention-Based Dynamic Patch Fusion
Wang, Haoyi
Sanchez, Victor
Li, Chang-Tsun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1084 - 1096
[30] Emotional Voice Conversion Using Deep Neural Networks with MCC and F0 Features
Luo, Zhaojie
Takiguchi, Tetsuya
Ariki, Yasuo
2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 977 - 981

← 1 2 3 4 5 →