Hear Your Face: Face-based voice conversion with F0 estimation

被引:0
|
作者
Lee, Jaejun [1 ]
Oh, Yoori [1 ]
Hwang, Injune [1 ]
Lee, Kyogu [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Dept Intelligence & Informat, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
[3] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
来源
关键词
voice conversion; face/voice association; cross modal generation; speaker embedding; IDENTITY;
D O I
10.21437/Interspeech.2024-232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper delves into the emerging field of face-based voice conversion, leveraging the unique relationship between an individual's facial features and their vocal characteristics. We present a novel face-based voice conversion framework that particularly utilizes the average fundamental frequency of the target speaker, derived solely from their facial images. Through extensive analysis, our framework demonstrates superior speech generation quality and the ability to align facial features with voice characteristics, including tracking of the target speaker's fundamental frequency.
引用
收藏
页码:4378 / 4382
页数:5
相关论文
共 50 条
  • [41] Neural pathways subserving face-based mentalizing
    Yordanka Nikolova Yordanova
    Hugues Duffau
    Guillaume Herbet
    Brain Structure and Function, 2017, 222 : 3087 - 3105
  • [42] Intelligent Face-Based Mobile Images Categorization
    Chen, Duan-Yu
    Tsai, Jeng-Tsung
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (02) : 347 - 360
  • [43] Shared Features for Multiple Face-Based Biometrics
    Nwogu, Ifeoma
    Zhou, Yingbo
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 417 - 422
  • [44] Face-based digital signatures for video retrieval
    Cotsaces, Costas
    Nikolaidis, Nikos
    Pitas, Ioannis
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (04) : 549 - 553
  • [45] Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
    Sheng, Zheng-Yan
    Ai, Yang
    Chen, Yan-Nian
    Ling, Zhen-Hua
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8443 - 8452
  • [46] Automatic face-based image grouping for albuming
    Das, M
    Loui, AC
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3726 - 3731
  • [47] Features for robust face-based identity verification
    Sanderson, C
    Paliwal, KK
    SIGNAL PROCESSING, 2003, 83 (05) : 931 - 940
  • [48] Neural pathways subserving face-based mentalizing
    Yordanova, Yordanka Nikolova
    Duffau, Hugues
    Herbet, Guillaume
    BRAIN STRUCTURE & FUNCTION, 2017, 222 (07): : 3087 - 3105
  • [49] Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
    Wu, Zhi-Zheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1732 - +
  • [50] Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion
    Nose, Takashi
    Kobayashi, Takao
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 578 - 581