Wavelet Diffusion Models are fast and scalable Image Generators

被引:15
|
作者
Phung, Hao [1 ]
Dao, Quan [1 ]
Tran, Anh [1 ]
机构
[1] VinAI Res, Hanoi, Vietnam
关键词
D O I
10.1109/CVPR52729.2023.00983
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models are rising as a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances. However, their slow training and inference speed is a huge bottleneck, blocking them from being used in real-time applications. A recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts. This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme. We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality. Furthermore, we propose to use a reconstruction term, which effectively boosts the model training convergence. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets prove our solution is a stepping-stone to offering real-time and high-fidelity diffusion models. Our code and pre-trained checkpoints are available at https://github.com/VinAIResearch/WaveDiff.git.
引用
收藏
页码:10199 / 10208
页数:10
相关论文
共 50 条
  • [1] Fast and efficient spatial scalable image compression using wavelet lower trees
    Oliver, J
    Malumbres, MP
    [J]. DCC 2003: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2003, : 133 - 142
  • [2] WaveDM: Wavelet-Based Diffusion Models for Image Restoration
    Huang, Yi
    Huang, Jiancheng
    Liu, Jianzhuang
    Yan, Mingfu
    Dong, Yu
    Lv, Jiaxi
    Chen, Chaoqi
    Chen, Shifeng
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7058 - 7073
  • [3] Scalable image embeddings from arbitrary wavelet-based perceptual models
    Gaubatz, M
    Hemami, SS
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 65 - 68
  • [4] Contour wavelet diffusion: A fast and high-quality image generation model
    Ding, Yaoyao
    Zhu, Xiaoxi
    Zou, Yuntao
    [J]. COMPUTATIONAL INTELLIGENCE, 2024, 40 (02)
  • [5] Scalable Diffusion Models with Transformers
    Peebles, William
    Xie, Saining
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4172 - 4182
  • [6] Adaptive wavelet regularity scalable image coding
    Ho, YF
    Hsung, TC
    Lun, DPK
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3493 - 3496
  • [7] Scalable parallel wavelet transforms for image processing
    Chadha, N
    Cuhadar, A
    Card, H
    [J]. IEEE CCEC 2002: CANADIAN CONFERENCE ON ELECTRCIAL AND COMPUTER ENGINEERING, VOLS 1-3, CONFERENCE PROCEEDINGS, 2002, : 851 - 856
  • [8] Scalable still image coding based on wavelet
    Yan, Y
    Zhang, ZB
    [J]. Electronic Imaging and Multimedia Technology IV, 2005, 5637 : 353 - 359
  • [9] Highly scalable image watermarking in the wavelet domain
    Preda, R.
    Vizireanu, D. N.
    Oprea, C. C.
    Udrea, R.
    [J]. PERCEPTION, 2008, 37 : 122 - 123
  • [10] Low-Light Image Enhancement with Wavelet-based Diffusion Models
    Jiang, Hai
    Luo, Ao
    Fan, Haoqiang
    Han, Songchen
    Liu, Shuaicheng
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):