Multimodal attention for lip synthesis using conditional generative adversarial networks

被引:0
|
作者
Vidal, Andrea [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, 800 W Campbell Rd, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Speech-driven animations; Socially interactive agents; Conditional GAN; Lip movements; Cross-modal attention; Attention mechanism; FACIAL ANIMATION; HEAD MOTION; SPEECH; DRIVEN;
D O I
10.1016/j.specom.2023.102959
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The synthesis of lip movements is an important problem for a socially interactive agent (SIA). It is important to generate lip movements that are synchronized with speech and have realistic co-articulation. We hypothesize that combining lexical information (i.e., sequence of phonemes) and acoustic features can lead not only to models that generate the correct lip movements matching the articulatory movements, but also to trajectories that are well synchronized with the speech emphasis and emotional content. This work presents attention -based frameworks that use acoustic and lexical information to enhance the synthesis of lip movements. The lexical information is obtained from automatic speech recognition (ASR) transcriptions, broadening the range of applications of the proposed solution. We propose models based on conditional generative adversarial networks (CGAN) with self-modality attention and cross-modalities attention mechanisms. These models allow us to understand which frames are considered more in the generation of lip movements. We animate the synthesized lip movements using blendshapes. These animations are used to compare our proposed multimodal models with alternative methods, including unimodal models implemented with either text or acoustic features. We rely on subjective metrics using perceptual evaluations and an objective metric based on the LipSync model. The results show that our proposed models with attention mechanisms are preferred over the baselines on the perception of naturalness. The addition of cross-modality attentions and self-modality attentions has a significant positive impact on the performance of the generated sequences. We observe that lexical information provides valuable information even when the transcriptions are not perfect. The improved performance observed by the multimodal system confirms the complementary information provided by the speech and text modalities.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [21] CT synthesis from MR images using frequency attention conditional generative adversarial network
    Wei, Kexin
    Kong, Weipeng
    Liu, Liheng
    Wang, Jian
    Li, Baosheng
    Zhao, Bo
    Li, Zhenjiang
    Zhu, Jian
    Yu, Gang
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170 (170)
  • [22] The Defense of Adversarial Example with Conditional Generative Adversarial Networks
    Yu, Fangchao
    Wang, Li
    Fang, Xianjin
    Zhang, Youwen
    SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
  • [23] Interpreting CNN predictions using conditional Generative Adversarial Networks
    Guna, R. T. Akash
    Sikha, O. K.
    Benitez, Raul
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [24] Using Generative Adversarial Networks for Conditional Creation of Anime Posters
    Sankalpa, Donthi
    Ramesh, Jayroop
    Zualkernan, Imran
    Proceedings of the 2022 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2022, 2022, : 197 - 203
  • [25] Spatial interpolation using conditional generative adversarial neural networks
    Zhu, Di
    Cheng, Ximeng
    Zhang, Fan
    Yao, Xin
    Gao, Yong
    Liu, Yu
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2020, 34 (04) : 735 - 758
  • [26] Vein Pattern Visualisation using Conditional Generative Adversarial Networks
    Keivanmarz, Ali
    Sharifzadeh, Hamid
    Fleming, Rachel
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1310 - 1316
  • [27] Fringe pattern normalization using conditional Generative Adversarial Networks
    Ram, Viren S.
    Gannavarpu, Rajshekhar
    Optik, 2024, 313
  • [28] Airfoil Inverse Design using Conditional Generative Adversarial Networks
    Tan, Xavier
    Manna, Dai
    Chattoraj, Joyjit
    Liu Yong
    Xu Xinxing
    Ha, Dao My
    Yang Feng
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 143 - 148
  • [29] Conditional Generative Adversarial Networks with Adversarial Attack and Defense for Generative Data Augmentation
    Baek, Francis
    Kim, Daeho
    Park, Somin
    Kim, Hyoungkwan
    Lee, SangHyun
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (03)
  • [30] Enhanced Text-to-Image Synthesis Conditional Generative Adversarial Networks
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    IAENG International Journal of Computer Science, 2022, 49 (01) : 1 - 7