Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

被引:4
|
作者
Huang, Ailin [1 ,2 ]
Huang, Zhewei [1 ]
Zhou, Shuchang [1 ]
机构
[1] Megvii Res, Beijing, Peoples R China
[2] Wuhan Univ, Wuhan, Peoples R China
关键词
Conversational Head Generation;
D O I
10.1145/3503161.3551577
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a generalized audio-to-head driver using regularization and assembling a high-visual quality renderer. We carefully tweak the audio-to-behavior model and post-process the generated video using our foreground-background fusion module. We get first place in the listening head generation track and second place in the talking head generation track on the official leaderboard. Our code is available at https://github.com/megvii-research/MM2022-ViCoPerceptualHeadGeneration.
引用
收藏
页码:7050 / 7054
页数:5
相关论文
共 18 条
  • [1] A Baseline for ViCo Conversational Head Generation Challenge
    Liu, Meng
    Zhai, Shuyan
    Li, Yongqiang
    Guan, Weili
    Nie, Liqiang
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7013 - 7015
  • [2] Learning and Evaluating Human Preferences for Conversational Head Generation
    Zhou, Mohan
    Bai, Yalong
    Zhang, Wei
    Yao, Ting
    Zhao, Tiejun
    Mei, Tao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9615 - 9619
  • [3] Improvements on SadTalker-based Approach for ViCo Conversational Head Generation Challenge
    Dai, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9566 - 9570
  • [4] Corpus-based generation of head and eyebrow motion for an embodied conversational agent
    Foster, Mary Ellen
    Oberlander, Jon
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (3-4) : 305 - 323
  • [5] Towards Realistic Conversational Head Generation: A Comprehensive Framework for Lifelike Video Synthesis
    Liu, Meng
    Li, Yongqiang
    Zhai, Shuyan
    Guan, Weili
    Nie, Liqiang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9441 - 9445
  • [6] Corpus-based generation of head and eyebrow motion for an embodied conversational agent
    Mary Ellen Foster
    Jon Oberlander
    [J]. Language Resources and Evaluation, 2007, 41 : 305 - 323
  • [7] Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline
    Chang, Zhigang
    Hu, Weitai
    Yang, Qing
    Zheng, Shibao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9581 - 9585
  • [8] Audio-Semantic Enhanced Pose-Driven Talking Head Generation
    Liu, Meng
    Li, Da
    Li, Yongqiang
    Song, Xuemeng
    Nie, Liqiang
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (11) : 11056 - 11069
  • [9] Spatial Object Tracking Using an Enhanced Mean Shift Method Based on Perceptual Spatial-Space Generation Model
    Han, Pengcheng
    Du, Junping
    Fang, Ming
    [J]. JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [10] SCHEDULING OF VARIABLE-HEAD HYDRO-THERMAL GENERATION USING AN ENHANCED BACTERIAL FORAGING ALGORITHM
    Farhat, I. A.
    El-Hawary, M. E.
    [J]. 2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 436 - 441