EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

被引：1

作者：

Drobyshev, Nikita ^{[1
]}

Casademunt, Antoni Bigata ^{[1
]}

Vougioukas, Konstantinos ^{[1
]}

Landgraf, Zoe ^{[1
]}

Petridis, Stavros ^{[1
]}

Pantic, Maja ^{[1
]}

机构：

[1] Imperial Coll London, London, England

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00812

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Head avatars animated by visual signals have gained popularity, particularly in cross-driving synthesis where the driver differs from the animated character, a challenging but highly practical approach. The recently presented MegaPortraits model has demonstrated state-of-the-art results in this domain. We conduct a deep examination and evaluation of this model, with a particular focus on its latent space for facial expression descriptors, and uncover several limitations with its ability to express intense face motions. To address these limitations, we propose substantial changes in both training pipeline and model architecture, to introduce our EMOPortraits model, where we: Enhance the model's capability to faithfully support in-tense, asymmetric face expressions, setting a new state-of-the-art result in the emotion transfer task, surpassing previous methods in both metrics and quality. Incorporate speech-driven mode to our model, achieving top-tier performance in audio-driven facial animation, making it possible to drive source identity through diverse modalities, including visual signal, audio, or a blend of both. Furthermore, we propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions, filling the gap with absence of such data in existing datasets.

引用

页码：8498 / 8507

页数：10

共 41 条

[31] One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Li, Weichuang
Zhang, Longhao
Wang, Dong
Zhao, Bin
Wang, Zhigang
Chen, Mulin
Zhang, Bang
Wang, Zhongjian
Bo, Liefeng
Li, Xuelong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17969 - 17978
[32] Skeleton-Based Dynamic Hand Gesture Recognition Using an Enhanced Network with One-Shot Learning
Ma, Chunyong
Zhang, Shengsheng
Wang, Anni
Qi, Yongyang
Chen, Ge
APPLIED SCIENCES-BASEL, 2020, 10 (11):
[33] VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
Phong Tran
Zakharov, Egor
Long-Nhat Ho
Anh Tuan Tran
Hu, Liwen
Li, Hao
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10336 - 10348
[34] Phacoemulsification using a chisel-shaped illuminator: enhanced depth trench, one-shot crack, and phaco cut
Wi, Jaemin
Seo, Hyejin
Lee, Jong Yeon
Nam, Dong Heun
EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2016, 26 (03) : 279 - 280
[35] CLESSR-VC: Contrastive learning enhanced self-supervised representations for one-shot voice conversion
Xue, Yuhang
Chen, Ning
Luo, Yixin
Zhu, Hongqing
Zhu, Zhiying
SPEECH COMMUNICATION, 2024, 165
[36] One-shot learning-based driver's head movement identification using a millimetre-wave radar sensor
Hong Nhung Nguyen
Lee, Seongwook
Tien-Tung Nguyen
Kim, Yong-Hwa
IET RADAR SONAR AND NAVIGATION, 2022, 16 (05): : 825 - 836
[37] Attention-Enhanced One-Shot Attack against Single Object Tracking for Unmanned Aerial Vehicle Remote Sensing Images
Jiang, Yan
Yin, Guisheng
REMOTE SENSING, 2023, 15 (18)
[38] One-Shot In Vitro Evolution Generated an Antibody Fragment for Testing Urinary Cotinine with More Than 40-Fold Enhanced Affinity
Oyama, Hiroyuki
Morita, Izumi
Kiguchi, Yuki
Banzono, Erika
Ishii, Kasumi
Kubo, Satoshi
Watanabe, Yoshiro
Hirai, Anna
Kaede, Chiaki
Ohta, Mitsuhiro
Kobayashi, Norihiro
ANALYTICAL CHEMISTRY, 2017, 89 (01) : 988 - 995
[39] High-quality one-shot interactive segmentation for remote sensing images via hybrid adapter-enhanced foundation models
Zhang, Zhili
Hu, Xiangyun
Yang, Yue
Yang, Bingnan
Deng, Kai
Dai, Hengming
Zhang, Mi
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 139
[40] 3-D Facial Priors Guided Local-Global Motion Collaboration Transforms for One-Shot Talking-Head Video Synthesis
Chen, Yilei
Zeng, Rui
Xiong, Shengwu
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 132 - 143

← 1 2 3 4 5 →