Collaborative Diffusion for Multi-Modal Face Generation and Editing

被引:18
|
作者
Huang, Ziqi [1 ]
Chan, Kelvin C. K. [1 ]
Jiang, Yuming [1 ]
Liu, Ziwei [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
关键词
D O I
10.1109/CVPR52729.2023.00589
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g. generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multimodal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multimodal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.
引用
收藏
页码:6080 / 6090
页数:11
相关论文
共 50 条
  • [21] Development of an integrated multi-modal communication robotic face
    Pierce, Brennand
    Kuratate, Takaaki
    Maejima, Akinobu
    Morishima, Shigeo
    Matsusaka, Yosuke
    Durkovic, Marko
    Diepold, Klaus
    Cheng, Gordon
    2012 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS (ARSO), 2012, : 104 - +
  • [22] Face Recognition using Multi-modal Binary Patterns
    Thanh Phuong Nguyen
    Ngoc-Son Vu
    Caplier, Alice
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2343 - 2346
  • [23] An Overview of Multi-Modal Biometrics Based on Face and Ear
    Zhang, Haijun
    Huang, Zengxi
    Li, Yibo
    2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1705 - 1709
  • [24] Multi-modal face tracking using Bayesian network
    Liu, F
    Lin, XY
    Lie, SZ
    Shi, YC
    IEEE INTERNATIONAL WORKSHOP ON ANALYSIS AND MODELING OF FACE AND GESTURES, 2003, : 135 - 142
  • [25] A Multi-Modal Chinese Poetry Generation Model
    Liu, Dayiheng
    Guo, Quan
    Li, Wubo
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [26] Research and Implementation of of Multi-modal Face Recognition Algorithm
    Ye Jihua
    Xia Guomiao
    Hu Dan
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 2086 - 2090
  • [27] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [29] A collaborative framework for multi-modal information exchange in varying environments
    Hogue, Isaac
    McQuay, William
    CTS 2007: PROCEEDINGS OF THE 2007 INTERNATIONAL SYMPOSIUM ON COLLABORATIVE TECHNOLOGIES AND SYSTEMS, 2007, : 159 - +
  • [30] A Collaborative Interaction and Visualization Multi-Modal Environment for Surgical Planning
    Foo, Jung Leng
    Martinez-Escobar, Marisol
    Peloquin, Catherine
    Lobe, Thom
    Winer, Eliot
    MEDICINE MEETS VIRTUAL REALITY 17 - NEXTMED: DESIGN FOR/THE WELL BEING, 2009, 142 : 97 - 102