LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

被引:0
|
作者
Hu, Yaosi [1 ]
Chen, Zhenzhong [1 ]
Luo, Chong [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Video generation; Video prediction; Diffusion model; Motion generation;
D O I
10.1007/s11263-025-02386-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural movements while efficiently sampling videos. In this paper, we propose to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. Specifically, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion representation. Consequently, the diffusion-based motion generator is able to efficiently generate realistic motion on a continuous latent space under multi-modal conditions, at a cost that is similar to that of image diffusion models. Results show that LaMD generates high-quality videos on various benchmark datasets, including BAIR, Landscape, NATOPS, MUG and CATER-GEN, that encompass a variety of stochastic dynamics and highly controllable movements on multiple image-conditional video generation tasks, while significantly decreases sampling time.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] SignGen: End-to-End Sign Language Video Generation with Latent Diffusion
    Qi, Fan
    Duan, Yu
    Zhang, Huaiwen
    Xu, Changsheng
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 252 - 270
  • [22] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
    Yaohui Wang
    Xinyuan Chen
    Xin Ma
    Shangchen Zhou
    Ziqi Huang
    Yi Wang
    Ceyuan Yang
    Yinan He
    Jiashuo Yu
    Peiqing Yang
    Yuwei Guo
    Tianxing Wu
    Chenyang Si
    Yuming Jiang
    Cunjian Chen
    Chen Change Loy
    Bo Dai
    Dahua Lin
    Yu Qiao
    Ziwei Liu
    International Journal of Computer Vision, 2025, 133 (5) : 3059 - 3078
  • [23] Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes
    He, Yang
    Schiele, Bernt
    Fritz, Mario
    COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 422 - 437
  • [24] CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
    Mei, Kangfu
    Delbracio, Mauricio
    Talebi, Hossein
    Tu, Zhengzhong
    Patel, Vishal M.
    Milanfar, Peyman
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 9048 - 9058
  • [25] Short-Term Wind Power Scenario Generation Based on Conditional Latent Diffusion Models
    Dong, Xiaochong
    Mao, Zhihang
    Sun, Yingyun
    Xu, Xinzhi
    IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2024, 15 (02) : 1074 - 1085
  • [26] Latent Diffusion for Language Generation
    Lovelace, Justin
    Kishore, Varsha
    Wan, Chao
    Shekhtman, Eliot
    Weinberger, Kilian Q.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Learning to Forecast and Refine Residual Motion for Image-to-Video Generation
    Zhao, Long
    Peng, Xi
    Tian, Yu
    Kapadia, Mubbasir
    Metaxas, Dimitris
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 403 - 419
  • [28] DCVGAN: DEPTH CONDITIONAL VIDEO GENERATION
    Nakahira, Yuki
    Kawamoto, Kazuhiko
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 749 - 753
  • [29] LIVE - Latent Image and Video Encoding
    Plasterer, John P.
    Xu, Lin
    Yang, Hsuan-Ting
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLVII, 2024, 13137
  • [30] Diffusion Reward: Learning Rewards via Conditional Video Diffusion
    Huang, Tao
    Jiang, Guangqi
    Ze, Yanjie
    Xu, Huazhe
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 478 - 495