Animating Face using Disentangled Audio Representations

被引:0
|
作者
Mittal, Gaurav [1 ]
Wang, Baoyuan [1 ]
机构
[1] Microsoft, Redmond, WA 98053 USA
关键词
D O I
10.1109/wacv45572.2020.9093527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone. As we show empirically, one can easily break these systems by simply adding certain background noise to the utterance or changing its emotional tone (to for example, sad). To make talking head generation robust to such variations, we propose an explicit audio representation learning framework that disentangles audio sequences into various factors such as phonetic content, emotional tone, background noise and others. We conduct experiments to validate that when conditioned on disentangled content representation, the generated mouth movement by our model is significantly more accurate than previous approaches (without disentangled learning) in the presence of noise and emotional variations. We further demonstrate that our framework is compatible with current state-of-the-art approaches by replacing their original component to learn audio based representation with ours. To the best of our knowledge, this is the first work which improves the performance of talking head generation through a disentangled audio representation perspective, which is important for many real-world applications.
引用
收藏
页码:3279 / 3287
页数:9
相关论文
共 50 条
  • [21] Deep Disentangled Representations for Volumetric Reconstruction
    Grant, Edward
    Kohli, Pushmeet
    van Gerven, Marcel
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 266 - 279
  • [22] A Contrastive Objective for Learning Disentangled Representations
    Kahana, Jonathan
    Hoshen, Yedid
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 579 - 595
  • [23] An Identifiable Double VAE For Disentangled Representations
    Mita, Graziano
    Filippone, Maurizio
    Michiardi, Pietro
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [24] Learning disentangled representations in the imaging domain
    Liu, Xiao
    Sanchez, Pedro
    Thermos, Spyridon
    O'Neil, Alison Q.
    Tsaftaris, Sotirios A.
    MEDICAL IMAGE ANALYSIS, 2022, 80
  • [25] Adversarial Robustness through Disentangled Representations
    Yang, Shuo
    Guo, Tianyu
    Wang, Yunhe
    Xu, Chang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3145 - 3153
  • [26] STOCHASTIC VIDEO GENERATION WITH DISENTANGLED REPRESENTATIONS
    Li, Maomao
    Yuan, Chun
    Lin, Zhihui
    Zheng, Zhuobin
    Cheng, Yangyang
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 224 - 229
  • [27] Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings
    Klein, Lukas
    Carvalho, Joao B. S.
    El-Assady, Mennatallah
    Penna, Paolo
    Buhmann, Joachim M.
    Jaeger, Paul
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 689 - 712
  • [28] A Commentary on the Unsupervised Learning of Disentangled Representations
    Locatello, Francesco
    Bauer, Stefan
    Lucie, Mario
    Raetsch, Gunnar
    Gelly, Sylvain
    Schoelkopf, Bernhard
    Bachem, Olivier
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13681 - 13684
  • [29] On Learning Disentangled Representations for Gait Recognition
    Zhang, Ziyuan
    Tran, Luan
    Liu, Feng
    Liu, Xiaoming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 345 - 360
  • [30] Better representations: Invariant, disentangled and reusable
    Montavon, Grégoire
    Müller, Klaus-Robert
    Montavon, G. (gregoire.montavon@tu-berlin.de), 1600, Springer Verlag, Tiergartenstrasse 17, Heidelberg, D-69121, Germany (7700 LECTURE NO): : 559 - 560