Person Image Synthesis via Denoising Diffusion Model

被引:15
|
作者
Bhunia, Ankan Kumar [1 ]
Khan, Salman [1 ,2 ]
Cholakkal, Hisham [1 ]
Anwer, Rao Muhammad [1 ,4 ]
Laaksonen, Jorma [4 ]
Shah, Mubarak [5 ]
Khan, Fahad Shahbaz [1 ,3 ]
机构
[1] Mohamed bin Zayed Univ AI, Abu Dhabi, U Arab Emirates
[2] Australian Natl Univ, Canberra, ACT, Australia
[3] Linkoping Univ, Linkoping, Sweden
[4] Aalto Univ, Espoo, Finland
[5] Univ Cent Florida, Orlando, FL USA
关键词
D O I
10.1109/CVPR52729.2023.00578
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates the complex transfer problem into a series of simpler forward-backward denoising steps. This helps in learning plausible source-to-target transformation trajectories that result in faithful textures and undistorted appearance details. We introduce a 'texture diffusion module' based on cross-attention to accurately model the correspondences between appearance and pose information available in source and target images. Further, we propose 'disentangled classifier-free guidance' to ensure close resemblance between the conditional inputs and the synthesized output in terms of both pose and appearance information. Our extensive results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios. We also show how our generated images can help in downstream tasks. Code is available at https://github.com/ankanbhunia/PIDM.
引用
收藏
页码:5968 / 5976
页数:9
相关论文
共 50 条
  • [1] Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
    Li, Tong
    Feng, Hansen
    Wang, Lizhi
    Zhu, Lin
    Xiong, Zhiwei
    Huang, Hua
    [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12) : 8240 - 8257
  • [2] Image Denoising via Nonlinear Hybrid Diffusion
    Ji, Xiaoping
    Zhang, Dazhi
    Guo, Zhichang
    Wu, Boying
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
  • [3] Image denoising and segmentation via nonlinear diffusion
    Chen, YM
    Vemuri, BC
    Wang, L
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2000, 39 (5-6) : 131 - 149
  • [4] PET image denoising based on denoising diffusion probabilistic model
    Kuang Gong
    Keith Johnson
    Georges El Fakhri
    Quanzheng Li
    Tinsu Pan
    [J]. European Journal of Nuclear Medicine and Molecular Imaging, 2024, 51 : 358 - 368
  • [5] PET image denoising based on denoising diffusion probabilistic model
    Gong, Kuang
    Johnson, Keith
    El Fakhri, Georges
    Li, Quanzheng
    Pan, Tinsu
    [J]. EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 (02) : 358 - 368
  • [6] A Nonlinear Hybrid Diffusion Model for Image Denoising
    Alam, Khursheed
    Kumar, Santosh
    Kumar, Nitendra
    Pandey, Shri Prakash
    Pal, Surya Kant
    [J]. MACROMOLECULAR SYMPOSIA, 2023, 407 (01)
  • [7] Nonlinear Diffusion Model for Fabric Image Denoising
    Chen, Dali
    Xue, Dingyu
    Chen, Yangquan
    [J]. ADVANCES IN TEXTILE ENGINEERING AND MATERIALS, 2013, 627 : 484 - +
  • [8] Image denoising via an adaptive weighted anisotropic diffusion
    Yong Chen
    Taoshun He
    [J]. Multidimensional Systems and Signal Processing, 2021, 32 : 651 - 669
  • [9] Image denoising via an adaptive weighted anisotropic diffusion
    Chen, Yong
    He, Taoshun
    [J]. MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2021, 32 (02) : 651 - 669
  • [10] A denoising approach via wavelet domain diffusion and image domain diffusion
    Xiaobo Zhang
    [J]. Multimedia Tools and Applications, 2017, 76 : 13545 - 13561