MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

被引:0
|
作者
Tang, Shitao [1 ]
Zhang, Fuyang [1 ]
Chen, Jiacheng [1 ]
Wang, Peng [2 ]
Furukawa, Yasutaka [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
[2] Bytedance, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh. The project page is at https://mvdiffusion.github.io/.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] MVDD: Multi-view Depth Diffusion Models
    Wang, Zhen
    Xu, Qiangeng
    Tan, Feitong
    Chai, Menglei
    Liu, Shichen
    Pandey, Rohit
    Fanelli, Sean
    Kadambi, Achuta
    Zhang, Yinda
    COMPUTER VISION - ECCV 2024, PT XIII, 2025, 15071 : 236 - 253
  • [42] Graph Structure Aware Contrastive Multi-View Clustering
    Chen, Rui
    Tang, Yongqiang
    Cai, Xiangrui
    Yuan, Xiaojie
    Feng, Wenlong
    Zhang, Wensheng
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 260 - 274
  • [43] A Local Correspondence-Aware Hybrid CNN-GCN Model for Single-Image Human Body Reconstruction
    Sun, Qingping
    Xiao, Yi
    Zhang, Jie
    Zhou, Shizhe
    Leung, Chi-Sing
    Su, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4679 - 4690
  • [44] Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
    Zhang, Xuanmeng
    Zheng, Zhedong
    Gao, Daiheng
    Zhang, Bang
    Pan, Pan
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18429 - 18438
  • [45] Beyond global fusion: A group-aware fusion approach for multi-view image clustering
    Xue, Zhe
    Li, Guorong
    Wang, Shuhui
    Huang, Jun
    Zhang, Weigang
    Huang, Qingming
    INFORMATION SCIENCES, 2019, 493 : 176 - 191
  • [46] Image Classification Via Multi-View Model
    Cheng, Yanyun
    Zhu, Songhao
    Liang, Zhiwei
    Xu, Guozheng
    PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3333 - 3337
  • [47] Probabilistic multi-view correspondence in a distributed setting with no central server
    Avidan, S
    Moses, Y
    Moses, Y
    COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 : 428 - 441
  • [48] SketchDesc: Learning Local Sketch Descriptors for Multi-View Correspondence
    Yu, Deng
    Li, Lei
    Zheng, Youyi
    Lau, Manfred
    Song, Yi-Zhe
    Tai, Chiew-Lan
    Fu, Hongbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1738 - 1750
  • [49] Image selection for improved multi-view stereo
    Hornung, Alexander
    Zeng, Boyi
    Kobbelt, Leif
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 2696 - 2703
  • [50] The Research Based on Multi-view Image Registration
    Wu, KaiXing
    Hao, Juan
    Wang, ChunHua
    APPLIED INFORMATICS AND COMMUNICATION, PT 4, 2011, 227 : 381 - 387