Multi-Modal Driven Pose-Controllable Talking Head Generation

被引：0

作者：

Sun, Kuiyuan ^{[1
]}

Liu, Xiaolong ^{[1
]}

Li, Xiaolong ^{[1
]}

Zhao, Yao ^{[1
]}

Wang, Wei ^{[1
]}

机构：

[1] Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing, China

来源：

ACM Transactions on Multimedia Computing, Communications and Applications | 2024年 / 20卷 / 12期

关键词：

D O I：

10.1145/3673901

中图分类号：

G2 [信息与知识传播];

学科分类号：

05 ; 0503 ;

摘要：

Talking head, driving a source image to generate a talking video using other modality information, has made great progress in recent years. However, there are two main issues: (1) These methods are designed to utilize a single modality of information. (2) Most methods cannot control head pose. To address these problems, we propose a novel framework that can utilize multi-modal information to generate a talking head video, while achieving arbitrary head pose control by a movement sequence. Specifically, first, to extend driving information to multiple modalities, multi-modal information is encoded to a unified semantic latent space to generate expression parameters. Secondly, to disentangle attributes, the 3D Morphable Model (3DMM) is utilized to obtain identity information from the source image, and translation and rotation information from the target image. Thirdly, to control head pose and mouth shape, the source image is warped by a motion field generated by the expression parameter, translation parameter, and angle parameter. Finally, all the above parameters are utilized to render a landmark map, and the warped source image is combined with the landmark map to generate a delicate talking head video. Our experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance in terms of visual quality, lip-audio synchronization, and head pose control. © 2024 Copyright held by the owner/author(s)

引用

共 50 条

[31] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
Liu, Xiaozheng
Zhang, Yunzhou
Cao, He
Shan, Dexing
Zhao, Jiaqi
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
[32] Multi-Modal Summary Generation using Multi-Objective Optimization
Jangra, Anubhav
Saha, Sriparna
Jatowt, Adam
Hasanuzzaman, Mohammad
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1745 - 1748
[33] Multi-modal AI Systems for Human and Animal Pose Estimation in Challenging Conditions
Deng, Qianyi
2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 239 - 240
[34] StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles
Ma, Yifeng
Wang, Suzhen
Hu, Zhipeng
Fan, Changjie
Lv, Tangjie
Ding, Yu
Deng, Zhidong
Yu, Xin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1896 - 1904
[35] Contextual Information Driven Multi-modal Medical Image Fusion
Luo, Xiao-Qing
Zhang, Zhan-Cheng
Zhang, Bao-Cheng
Wu, Xiao-Jun
IETE TECHNICAL REVIEW, 2017, 34 (06) : 598 - 611
[36] Multi-modal multi-head self-attention for medical VQA
Vasudha Joshi
Pabitra Mitra
Supratik Bose
Multimedia Tools and Applications, 2024, 83 : 42585 - 42608
[37] Multi-modal multi-head self-attention for medical VQA
Joshi, Vasudha
Mitra, Pabitra
Bose, Supratik
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
[38] Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
Mercat, Jean
Gilles, Thomas
El Zoghby, Nicole
Sandou, Guillaume
Beauvois, Dominique
Gil, Guillermo Pita
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9638 - 9644
[39] Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks
Lee, Jun-Tae
Park, Hyunsin
Yun, Sungrack
Chang, Simyung
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7354 - 7362
[40] A flexible and controllable condenser device based on programmable LCD for multi-modal imaging
Xu, Qiulong
Fan, Yao
Zuo, Chao
AOPC 2023:COMPUTING IMAGING TECHNOLOGY, 2023, 12967

← 1 2 3 4 5 →