Multi-Modal Driven Pose-Controllable Talking Head Generation

被引:0
|
作者
Sun, Kuiyuan [1 ]
Liu, Xiaolong [1 ]
Li, Xiaolong [1 ]
Zhao, Yao [1 ]
Wang, Wei [1 ]
机构
[1] Institute of Information Science, Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing, China
关键词
D O I
10.1145/3673901
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
Talking head, driving a source image to generate a talking video using other modality information, has made great progress in recent years. However, there are two main issues: (1) These methods are designed to utilize a single modality of information. (2) Most methods cannot control head pose. To address these problems, we propose a novel framework that can utilize multi-modal information to generate a talking head video, while achieving arbitrary head pose control by a movement sequence. Specifically, first, to extend driving information to multiple modalities, multi-modal information is encoded to a unified semantic latent space to generate expression parameters. Secondly, to disentangle attributes, the 3D Morphable Model (3DMM) is utilized to obtain identity information from the source image, and translation and rotation information from the target image. Thirdly, to control head pose and mouth shape, the source image is warped by a motion field generated by the expression parameter, translation parameter, and angle parameter. Finally, all the above parameters are utilized to render a landmark map, and the warped source image is combined with the landmark map to generate a delicate talking head video. Our experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance in terms of visual quality, lip-audio synchronization, and head pose control. © 2024 Copyright held by the owner/author(s)
引用
收藏
相关论文
共 50 条
  • [31] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
    Liu, Xiaozheng
    Zhang, Yunzhou
    Cao, He
    Shan, Dexing
    Zhao, Jiaqi
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
  • [32] Multi-Modal Summary Generation using Multi-Objective Optimization
    Jangra, Anubhav
    Saha, Sriparna
    Jatowt, Adam
    Hasanuzzaman, Mohammad
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1745 - 1748
  • [33] Multi-modal AI Systems for Human and Animal Pose Estimation in Challenging Conditions
    Deng, Qianyi
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 239 - 240
  • [34] StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles
    Ma, Yifeng
    Wang, Suzhen
    Hu, Zhipeng
    Fan, Changjie
    Lv, Tangjie
    Ding, Yu
    Deng, Zhidong
    Yu, Xin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1896 - 1904
  • [35] Contextual Information Driven Multi-modal Medical Image Fusion
    Luo, Xiao-Qing
    Zhang, Zhan-Cheng
    Zhang, Bao-Cheng
    Wu, Xiao-Jun
    IETE TECHNICAL REVIEW, 2017, 34 (06) : 598 - 611
  • [36] Multi-modal multi-head self-attention for medical VQA
    Vasudha Joshi
    Pabitra Mitra
    Supratik Bose
    Multimedia Tools and Applications, 2024, 83 : 42585 - 42608
  • [37] Multi-modal multi-head self-attention for medical VQA
    Joshi, Vasudha
    Mitra, Pabitra
    Bose, Supratik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
  • [38] Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
    Mercat, Jean
    Gilles, Thomas
    El Zoghby, Nicole
    Sandou, Guillaume
    Beauvois, Dominique
    Gil, Guillermo Pita
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9638 - 9644
  • [39] Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks
    Lee, Jun-Tae
    Park, Hyunsin
    Yun, Sungrack
    Chang, Simyung
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7354 - 7362
  • [40] A flexible and controllable condenser device based on programmable LCD for multi-modal imaging
    Xu, Qiulong
    Fan, Yao
    Zuo, Chao
    AOPC 2023:COMPUTING IMAGING TECHNOLOGY, 2023, 12967