3D-Aware Talking-Head Video Motion Transfer

被引：0

作者：

Ni, Haomiao ^{[1
]}

Liu, Jiachen ^{[1
]}

Xue, Yuan ^{[2
]}

Huang, Sharon X. ^{[1
]}

机构：

[1] Penn State Univ, University Pk, PA 16802 USA

[2] Ohio State Univ, Columbus, OH USA

来源：

2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024 | 2024年

关键词：

NETWORK; MODEL;

D O I：

10.1109/WACV57701.2024.00488

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Motion transfer of talking-head videos involves generating a new video with the appearance of a subject video and the motion pattern of a driving video. Current methodologies primarily depend on a limited number of subject images and 2D representations, thereby neglecting to fully utilize the multi-view appearance features inherent in the subject video. In this paper, we propose a novel 3D-aware talking-head video motion transfer network, Head3D, which fully exploits the subject appearance information by generating a visually-interpretable 3D canonical head from the 2D subject frames with a recurrent network. A key component of our approach is a self-supervised 3D head geometry learning module, designed to predict head poses and depth maps from 2D subject video frames. This module facilitates the estimation of a 3D head in canonical space, which can then be transformed to align with driving video frames. Additionally, we employ an attention-based fusion network to combine the background and other details from subject frames with the 3D subject head to produce the synthetic target video. Our extensive experiments on two public talking-head video datasets demonstrate that Head3D outperforms both 2D and 3D prior arts in the practical cross-identity setting, with evidence showing it can be readily adapted to the pose-controllable novel view synthesis task.

引用

页码：4942 / 4952

页数：11

共 50 条

[1] Context-Aware Talking-Head Video Editing
Yang, Songlin
Wang, Wei
Ling, Jun
Peng, Bo
Tan, Xu
Dong, Jing
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7718 - 7727
[2] Hierarchical Coding for Talking-Head Video
Liu, Yu
Li, Shibo
Zhu, Shuyuan
Yeung, Siu-Kei Au
Wen, Xing
Zeng, Bing
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3043 - 3047
[3] MakeltTalk: Speaker-Aware Talking-Head Animation
Zhou, Yang
Han, Xintong
Shechtman, Eli
Echevarria, Jose
Kalogerakis, Evangelos
Li, Dingzeyu
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
[4] Text-based Editing of Talking-head Video
Fried, Ohad
Tewari, Ayush
Zollhofer, Michael
Finkelstein, Adam
Shechtman, Eli
Goldman, Dan B.
Genova, Kyle
Jin, Zeyu
Theobalt, Christian
Agrawala, Maneesh
ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
[5] 3D-Aware Text-Driven Talking Avatar Generation
Wu, Xiuzhe
Sun, Yang-Tian
Chen, Handi
Zhou, Hang
Wang, Jingdong
Liu, Zhengzhe
Qi, Xiaojuan
COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 416 - 433
[6] 3-D Facial Priors Guided Local-Global Motion Collaboration Transforms for One-Shot Talking-Head Video Synthesis
Chen, Yilei
Zeng, Rui
Xiong, Shengwu
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 132 - 143
[7] Expression Flow for 3D-Aware Face Component Transfer
Yang, Fei
Wang, Jue
Shechtman, Eli
Bourdev, Lubomir
Metaxas, Dimitri
ACM TRANSACTIONS ON GRAPHICS, 2011, 30 (04):
[8] TrackCam: 3D-aware Tracking Shots from Consumer Video
Liu, Shuaicheng
Wang, Jue
Cho, Sunghyun
Tan, Ping
ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (06):
[9] Talking-head video generation with long short-term contextual semantics
Jing, Zhao
Bie, Hongxia
Wang, Jiali
Bie, Zhisong
Li, Jinxin
Ren, Jianwei
Zhi, Yichen
APPLIED INTELLIGENCE, 2025, 55 (02)
[10] PTUS: Photo-Realistic Talking Upper-Body Synthesis via 3D-Aware Motion Decomposition Warping
Lin, Luoyang
Jiang, Zutao
Liang, Xiaodan
Ma, Liqian
Kampffmeyer, Michael C.
Cao, Xiaochun
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3441 - 3449

← 1 2 3 4 5 →