Hear Your Face: Face-based voice conversion with F0 estimation

被引:0
|
作者
Lee, Jaejun [1 ]
Oh, Yoori [1 ]
Hwang, Injune [1 ]
Lee, Kyogu [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Dept Intelligence & Informat, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
[3] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
来源
关键词
voice conversion; face/voice association; cross modal generation; speaker embedding; IDENTITY;
D O I
10.21437/Interspeech.2024-232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper delves into the emerging field of face-based voice conversion, leveraging the unique relationship between an individual's facial features and their vocal characteristics. We present a novel face-based voice conversion framework that particularly utilizes the average fundamental frequency of the target speaker, derived solely from their facial images. Through extensive analysis, our framework demonstrates superior speech generation quality and the ability to align facial features with voice characteristics, including tracking of the target speaker's fundamental frequency.
引用
收藏
页码:4378 / 4382
页数:5
相关论文
共 50 条
  • [21] A face-based computer login system
    King, S
    Tian, GY
    Ward, S
    ADVANCES IN E-ENGINEERING AND DIGITAL ENTERPRISE TECHNOLOGY-I, PROCEEDINGS, 2004, : 523 - 533
  • [22] Is the mere exposure effect in face attractiveness image-based or face-based?
    Cullen, B.
    Newell, F.
    PERCEPTION, 2013, 42 : 203 - 203
  • [23] Face and voice perception: Monkey see, monkey hear
    Beauchamp, Michael S.
    CURRENT BIOLOGY, 2021, 31 (09) : R435 - R437
  • [24] The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion
    Chen, Ling-Hui
    Liu, Li-Juan
    Ling, Zhen-Hua
    Jiang, Yuan
    Dai, Li-Rong
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1642 - 1646
  • [25] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Zhaojie Luo
    Jinhui Chen
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [26] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [27] Cross-lingual voice conversion based on F0 multi-scale modeling with VITS
    Cao, Danyang
    Zhang, Zeyi
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 375 - 379
  • [28] Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
    Weng, Shao-En
    Shuai, Hong-Han
    Cheng, Wen-Huang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13718 - 13726
  • [29] Improving Face-Based Age Estimation With Attention-Based Dynamic Patch Fusion
    Wang, Haoyi
    Sanchez, Victor
    Li, Chang-Tsun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1084 - 1096
  • [30] Emotional Voice Conversion Using Deep Neural Networks with MCC and F0 Features
    Luo, Zhaojie
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 977 - 981