LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

被引:0
|
作者
Hu, Yaosi [1 ]
Chen, Zhenzhong [1 ]
Luo, Chong [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Video generation; Video prediction; Diffusion model; Motion generation;
D O I
10.1007/s11263-025-02386-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural movements while efficiently sampling videos. In this paper, we propose to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. Specifically, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion representation. Consequently, the diffusion-based motion generator is able to efficiently generate realistic motion on a continuous latent space under multi-modal conditions, at a cost that is similar to that of image diffusion models. Results show that LaMD generates high-quality videos on various benchmark datasets, including BAIR, Landscape, NATOPS, MUG and CATER-GEN, that encompass a variety of stochastic dynamics and highly controllable movements on multiple image-conditional video generation tasks, while significantly decreases sampling time.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Conditional Image-to-Video Generation with Latent Flow Diffusion Models
    Ni, Haomiao
    Shi, Changhao
    Li, Kai
    Huang, Sharon X.
    Min, Martin Renqiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18444 - 18455
  • [2] Decouple Content and Motion for Conditional Image-to-Video Generation
    Shen, Cuifeng
    Gan, Yulu
    Chen, Chen
    Zhu, Xiongwei
    Cheng, Lele
    Gao, Tingting
    Wang, Jinzhi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4757 - 4765
  • [3] Canvas GAN: Bootstrapped Image-Conditional Models
    Amodio, Matthew
    Krishnaswamy, Smita
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Compositional GAN: Learning Image-Conditional Binary Composition
    Azadi, Samaneh
    Pathak, Deepak
    Ebrahimi, Sayna
    Darrell, Trevor
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (10-11) : 2570 - 2585
  • [5] Compositional GAN: Learning Image-Conditional Binary Composition
    Samaneh Azadi
    Deepak Pathak
    Sayna Ebrahimi
    Trevor Darrell
    International Journal of Computer Vision, 2020, 128 : 2570 - 2585
  • [6] Latent diffusion model for conditional reservoir facies generation
    Lee, Daesoo
    Ovanger, Oscar
    Eidsvik, Jo
    Aune, Erlend
    Skauvold, Jacob
    Hauge, Ragnar
    COMPUTERS & GEOSCIENCES, 2025, 194
  • [7] Conditional Text Image Generation with Diffusion Models
    Zhu, Yuanzhi
    Li, Zhaohai
    Wang, Tianwei
    He, Mengchao
    Yao, Cong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14235 - 14245
  • [8] MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
    Voleti, Vikram
    Jolicoeur-Martineau, Alexia
    Pal, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Medical Image Generation based on Latent Diffusion Models
    Song, Wenbo
    Jiang, Yan
    Fang, Yin
    Cao, Xinyu
    Wu, Peiyan
    Xing, Hanshuo
    Wu, Xinglong
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE INNOVATION, ICAII 2023, 2023, : 89 - 93
  • [10] LaMoD: Latent Motion Diffusion Model for Myocardial Strain Generation
    Xing, Jiarui
    Jayakumar, Nivetha
    Wu, Nian
    Wang, Yu
    Epstein, Frederick H.
    Zhang, Miamniao
    SHAPE IN MEDICAL IMAGING, SHAPEMI 2024, 2025, 15275 : 164 - 177