MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

被引：0

作者：

Tang, Shitao ^{[1
]}

Zhang, Fuyang ^{[1
]}

Chen, Jiacheng ^{[1
]}

Wang, Peng ^{[2
]}

Furukawa, Yasutaka ^{[1
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC, Canada

[2] Bytedance, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh. The project page is at https://mvdiffusion.github.io/.

引用

页数：32

共 50 条

[41] MVDD: Multi-view Depth Diffusion Models
Wang, Zhen
Xu, Qiangeng
Tan, Feitong
Chai, Menglei
Liu, Shichen
Pandey, Rohit
Fanelli, Sean
Kadambi, Achuta
Zhang, Yinda
COMPUTER VISION - ECCV 2024, PT XIII, 2025, 15071 : 236 - 253
[42] Graph Structure Aware Contrastive Multi-View Clustering
Chen, Rui
Tang, Yongqiang
Cai, Xiangrui
Yuan, Xiaojie
Feng, Wenlong
Zhang, Wensheng
IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 260 - 274
[43] A Local Correspondence-Aware Hybrid CNN-GCN Model for Single-Image Human Body Reconstruction
Sun, Qingping
Xiao, Yi
Zhang, Jie
Zhou, Shizhe
Leung, Chi-Sing
Su, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4679 - 4690
[44] Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
Zhang, Xuanmeng
Zheng, Zhedong
Gao, Daiheng
Zhang, Bang
Pan, Pan
Yang, Yi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18429 - 18438
[45] Beyond global fusion: A group-aware fusion approach for multi-view image clustering
Xue, Zhe
Li, Guorong
Wang, Shuhui
Huang, Jun
Zhang, Weigang
Huang, Qingming
INFORMATION SCIENCES, 2019, 493 : 176 - 191
[46] Image Classification Via Multi-View Model
Cheng, Yanyun
Zhu, Songhao
Liang, Zhiwei
Xu, Guozheng
PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3333 - 3337
[47] Probabilistic multi-view correspondence in a distributed setting with no central server
Avidan, S
Moses, Y
Moses, Y
COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 : 428 - 441
[48] SketchDesc: Learning Local Sketch Descriptors for Multi-View Correspondence
Yu, Deng
Li, Lei
Zheng, Youyi
Lau, Manfred
Song, Yi-Zhe
Tai, Chiew-Lan
Fu, Hongbo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1738 - 1750
[49] Image selection for improved multi-view stereo
Hornung, Alexander
Zeng, Boyi
Kobbelt, Leif
2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 2696 - 2703
[50] The Research Based on Multi-view Image Registration
Wu, KaiXing
Hao, Juan
Wang, ChunHua
APPLIED INFORMATICS AND COMMUNICATION, PT 4, 2011, 227 : 381 - 387

← 1 2 3 4 5 →