Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

被引：4

作者：

Huang, Ailin ^{[1
,2
]}

Huang, Zhewei ^{[1
]}

Zhou, Shuchang ^{[1
]}

机构：

[1] Megvii Res, Beijing, Peoples R China

[2] Wuhan Univ, Wuhan, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Conversational Head Generation;

D O I：

10.1145/3503161.3551577

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a generalized audio-to-head driver using regularization and assembling a high-visual quality renderer. We carefully tweak the audio-to-behavior model and post-process the generated video using our foreground-background fusion module. We get first place in the listening head generation track and second place in the talking head generation track on the official leaderboard. Our code is available at https://github.com/megvii-research/MM2022-ViCoPerceptualHeadGeneration.

引用

页码：7050 / 7054

页数：5

共 18 条

[1] A Baseline for ViCo Conversational Head Generation Challenge
Liu, Meng
Zhai, Shuyan
Li, Yongqiang
Guan, Weili
Nie, Liqiang
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7013 - 7015
[2] Learning and Evaluating Human Preferences for Conversational Head Generation
Zhou, Mohan
Bai, Yalong
Zhang, Wei
Yao, Ting
Zhao, Tiejun
Mei, Tao
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9615 - 9619
[3] Improvements on SadTalker-based Approach for ViCo Conversational Head Generation Challenge
Dai, Wei
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9566 - 9570
[4] Corpus-based generation of head and eyebrow motion for an embodied conversational agent
Foster, Mary Ellen
Oberlander, Jon
[J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (3-4) : 305 - 323
[5] Towards Realistic Conversational Head Generation: A Comprehensive Framework for Lifelike Video Synthesis
Liu, Meng
Li, Yongqiang
Zhai, Shuyan
Guan, Weili
Nie, Liqiang
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9441 - 9445
[6] Corpus-based generation of head and eyebrow motion for an embodied conversational agent
Mary Ellen Foster
Jon Oberlander
[J]. Language Resources and Evaluation, 2007, 41 : 305 - 323
[7] Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
Chang, Zhigang
Hu, Weitai
Yang, Qing
Zheng, Shibao
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9581 - 9585
[8] Audio-Semantic Enhanced Pose-Driven Talking Head Generation
Liu, Meng
Li, Da
Li, Yongqiang
Song, Xuemeng
Nie, Liqiang
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (11) : 11056 - 11069
[9] Spatial Object Tracking Using an Enhanced Mean Shift Method Based on Perceptual Spatial-Space Generation Model
Han, Pengcheng
Du, Junping
Fang, Ming
[J]. JOURNAL OF APPLIED MATHEMATICS, 2013,
[10] SCHEDULING OF VARIABLE-HEAD HYDRO-THERMAL GENERATION USING AN ENHANCED BACTERIAL FORAGING ALGORITHM
Farhat, I. A.
El-Hawary, M. E.
[J]. 2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 436 - 441

← 1 2 →