MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

被引：0

作者：

Chu, Ernie ^{[1
]}

Huang, Tzuhsuan ^{[1
]}

Lin, Shuo-Yen ^{[1
]}

Chen, Jun-Cheng ^{[1
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, 128 Acad Rd,Sect 2, Taipei, Taiwan

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach. Our project page can be found at https://medm2023.github.io/.

引用

页码：1353 / 1361

页数：9

共 50 条

[41] Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
Khachatryan, Levon
Movsisyan, Andranik
Tadevosyan, Vahram
Henschel, Roberto
Wang, Zhangyang
Navasardyan, Shant
Shi, Humphrey
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15908 - 15918
[42] FUVT: a deep few-shot unsupervised learning-based video-to-video translation scheme using Kalman filtering and relativistic GAN
Roohi, Koorosh
Esmaeilzehi, Alireza
Ahmad, M. Omair
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (05)
[43] A society of models for video and image libraries
Picard, RW
IBM SYSTEMS JOURNAL, 1996, 35 (3-4) : 292 - 312
[44] Society of models for video and image libraries
MIT Media Laboratory, 20 Ames Street, Cambridge, MA 02139-4307, United States
不详
不详
不详
不详
IBM Syst J, 3-4 (292-312):
[45] Temporal Consistent Automatic Video Colorization via Semantic Correspondence
Beijing University of Posts and Telecommunications, School of Artificial Intelligence, China
不详
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. Workshops, 2160, (1836-1845):
[46] Application of inhomogeneous diffusion to image and video coding
Ford, GE
THIRTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1997, : 926 - 930
[47] TEMPORAL RAIN DECOMPOSITION WITH SPATIAL STRUCTURE GUIDANCE FOR VIDEO DERAINING
Xue, Xinwei
Ding, Ying
Ma, Long
Wang, Yi
Liu, Risheng
Fan, Xin
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2015 - 2019
[48] Robust High-Resolution Video Matting with Temporal Guidance
Lin, Shanchuan
Yang, Linjie
Saleemi, Imran
Sengupta, Soumyadip
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 3132 - 3141
[49] Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
Kim, Hee-Seon
Son, Minji
Kim, Minbeom
Kwon, Myung-Joon
Kim, Changick
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4302 - 4311
[50] TEXT TO VIDEO USING GANS AND DIFFUSION MODELS
Singhal, Nikita
Singh, Praval Pratap
Singh, Nikhil
Singh, Mahipal
Singh, Harsimrat
JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2024, 10 (02): : 198 - 213

← 1 2 3 4 5 →