MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

被引:0
|
作者
Chu, Ernie [1 ]
Huang, Tzuhsuan [1 ]
Lin, Shuo-Yen [1 ]
Chen, Jun-Cheng [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, 128 Acad Rd,Sect 2, Taipei, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach. Our project page can be found at https://medm2023.github.io/.
引用
收藏
页码:1353 / 1361
页数:9
相关论文
共 50 条
  • [41] Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
    Khachatryan, Levon
    Movsisyan, Andranik
    Tadevosyan, Vahram
    Henschel, Roberto
    Wang, Zhangyang
    Navasardyan, Shant
    Shi, Humphrey
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15908 - 15918
  • [42] FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN
    Roohi, Koorosh
    Esmaeilzehi, Alireza
    Ahmad, M. Omair
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (05)
  • [43] A society of models for video and image libraries
    Picard, RW
    IBM SYSTEMS JOURNAL, 1996, 35 (3-4) : 292 - 312
  • [44] Society of models for video and image libraries
    MIT Media Laboratory, 20 Ames Street, Cambridge, MA 02139-4307, United States
    不详
    不详
    不详
    不详
    IBM Syst J, 3-4 (292-312):
  • [45] Temporal Consistent Automatic Video Colorization via Semantic Correspondence
    Beijing University of Posts and Telecommunications, School of Artificial Intelligence, China
    不详
    IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. Workshops, 2160, (1836-1845):
  • [46] Application of inhomogeneous diffusion to image and video coding
    Ford, GE
    THIRTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1997, : 926 - 930
  • [47] TEMPORAL RAIN DECOMPOSITION WITH SPATIAL STRUCTURE GUIDANCE FOR VIDEO DERAINING
    Xue, Xinwei
    Ding, Ying
    Ma, Long
    Wang, Yi
    Liu, Risheng
    Fan, Xin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2015 - 2019
  • [48] Robust High-Resolution Video Matting with Temporal Guidance
    Lin, Shanchuan
    Yang, Linjie
    Saleemi, Imran
    Sengupta, Soumyadip
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 3132 - 3141
  • [49] Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
    Kim, Hee-Seon
    Son, Minji
    Kim, Minbeom
    Kwon, Myung-Joon
    Kim, Changick
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4302 - 4311
  • [50] TEXT TO VIDEO USING GANS AND DIFFUSION MODELS
    Singhal, Nikita
    Singh, Praval Pratap
    Singh, Nikhil
    Singh, Mahipal
    Singh, Harsimrat
    JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2024, 10 (02): : 198 - 213