Collaborative Diffusion for Multi-Modal Face Generation and Editing

被引：18

作者：

Huang, Ziqi ^{[1
]}

Chan, Kelvin C. K. ^{[1
]}

Jiang, Yuming ^{[1
]}

Liu, Ziwei ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore, Singapore

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00589

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g. generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multimodal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multimodal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

引用

页码：6080 / 6090

页数：11

共 50 条

[21] Development of an integrated multi-modal communication robotic face
Pierce, Brennand
Kuratate, Takaaki
Maejima, Akinobu
Morishima, Shigeo
Matsusaka, Yosuke
Durkovic, Marko
Diepold, Klaus
Cheng, Gordon
2012 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS (ARSO), 2012, : 104 - +
[22] Face Recognition using Multi-modal Binary Patterns
Thanh Phuong Nguyen
Ngoc-Son Vu
Caplier, Alice
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2343 - 2346
[23] An Overview of Multi-Modal Biometrics Based on Face and Ear
Zhang, Haijun
Huang, Zengxi
Li, Yibo
2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 1705 - 1709
[24] Multi-modal face tracking using Bayesian network
Liu, F
Lin, XY
Lie, SZ
Shi, YC
IEEE INTERNATIONAL WORKSHOP ON ANALYSIS AND MODELING OF FACE AND GESTURES, 2003, : 135 - 142
[25] A Multi-Modal Chinese Poetry Generation Model
Liu, Dayiheng
Guo, Quan
Li, Wubo
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[26] Research and Implementation of of Multi-modal Face Recognition Algorithm
Ye Jihua
Xia Guomiao
Hu Dan
2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 2086 - 2090
[27] Lightweight multi-modal emotion recognition model based on modal generation
Liu, Peisong
Che, Manqiang
Luo, Jiangchuan
2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
[28] Multi-modal face parts fusion based on Gabor feature for face recognition
相燕
High Technology Letters, 2009, 15 (01) : 70 - 74
[29] A collaborative framework for multi-modal information exchange in varying environments
Hogue, Isaac
McQuay, William
CTS 2007: PROCEEDINGS OF THE 2007 INTERNATIONAL SYMPOSIUM ON COLLABORATIVE TECHNOLOGIES AND SYSTEMS, 2007, : 159 - +
[30] A Collaborative Interaction and Visualization Multi-Modal Environment for Surgical Planning
Foo, Jung Leng
Martinez-Escobar, Marisol
Peloquin, Catherine
Lobe, Thom
Winer, Eliot
MEDICINE MEETS VIRTUAL REALITY 17 - NEXTMED: DESIGN FOR/THE WELL BEING, 2009, 142 : 97 - 102

← 1 2 3 4 5 →