SPACE : Speech-driven Portrait Animation with Controllable Expression

被引:0
|
作者
Gururani, Siddharth [1 ]
Mallya, Arun [1 ]
Wang, Ting-Chun [1 ]
Valle, Rafael [1 ]
Liu, Ming-Yu [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
关键词
D O I
10.1109/ICCV51070.2023.01912
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Animating portraits using speech has received growing attention in recent years, with various creative and practical use cases. An ideal generated video should have good lip sync with the audio, natural facial expressions and head motions, and high frame quality. In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons. Please visit the project page to view the videos and to see more results: https://research.nvidia.com/labs/dir/space/.
引用
收藏
页码:20857 / 20866
页数:10
相关论文
共 50 条
  • [1] Speech-driven animation with meaningful behaviors
    Sadoughi, Najmeh
    Busso, Carlos
    [J]. SPEECH COMMUNICATION, 2019, 110 : 90 - 100
  • [2] Expressive speech-driven facial animation
    Cao, Y
    Tien, WC
    Faloutsos, P
    Pighin, F
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302
  • [3] Realistic Speech-Driven Facial Animation with GANs
    Konstantinos Vougioukas
    Stavros Petridis
    Maja Pantic
    [J]. International Journal of Computer Vision, 2020, 128 : 1398 - 1413
  • [4] Speech-driven facial animation with realistic dynamics
    Gutierrez-Osuna, R
    Kakumanu, PK
    Esposito, A
    Garcia, ON
    Bojorquez, A
    Castillo, JL
    Rudomin, I
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (01) : 33 - 42
  • [5] Realistic Speech-Driven Facial Animation with GANs
    Vougioukas, Konstantinos
    Petridis, Stavros
    Pantic, Maja
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1398 - 1413
  • [6] Speech-driven facial animation using a hierarchical model
    Cosker, DP
    Marshall, AD
    Rosin, PL
    Hicks, YA
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
  • [7] Speech-Driven Facial Animation Using Manifold Relevance Determination
    Dawood, Samia
    Hicks, Yulia
    Marshall, David
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 869 - 882
  • [8] SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES
    Kefalas, Triantafyllos
    Vougioukas, Konstantinos
    Panagakis, Yannis
    Petridis, Stavros
    Kossaifi, Jean
    Pantic, Maja
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3487 - 3491
  • [9] Towards Realistic Real Time Speech-Driven Facial Animation
    Cerekovic, Aleksandra
    Zoric, Goranka
    Smid, Karlo
    Pandzic, Igor S.
    [J]. INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2008, 5208 : 476 - 478
  • [10] A comparison of acoustic coding models for speech-driven facial animation
    Kakumanu, Praveen
    Esposito, Anna
    Garcia, Oscar N.
    Gutierrez-Osuna, Ricardo
    [J]. SPEECH COMMUNICATION, 2006, 48 (06) : 598 - 615