Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引：11

作者：

Yang, Shuai ^{[1
]}

Zhou, Yifan ^{[1
]}

Liu, Ziwei ^{[1
]}

Loy, Chen Change ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore, Singapore

来源：

PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS | 2023年

关键词：

Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;

D O I：

10.1145/3610548.3618160

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/

引用

页数：11

共 50 条

[41] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
Pang, Zongshang
Nakashima, Yuta
Otani, Mayu
Nagahara, Hajime
JOURNAL OF IMAGING, 2024, 10 (09)
[42] SKETCHQL Demonstration: Zero-shot Video Moment Querying with Sketches
Wu, Renzhi
Chunduri, Pramod
Shah, Dristi j
Aravind, Ashmitha Julius
Payani, Ali
Chu, Xu
Arulraj, Joy
Rong, Kexin
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4429 - 4432
[43] Language-free Training for Zero-shot Video Grounding
Kim, Dahye
Park, Jungin
Lee, Jiyoung
Park, Seongheon
Sohn, Kwanghoon
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2538 - 2547
[44] Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
Yuan, Yichen
Wang, Yifan
Wang, Lijun
Zhao, Xiaoqi
Lu, Huchuan
Wang, Yu
Su, Weibo
Zhang, Lei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 966 - 976
[45] Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
Liu, Kangning
Gu, Shuhang
Romero, Andres
Timofte, Radu
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1029 - 1039
[46] An Evaluation of Video-to-Video Face Verification
Poh, Norman
Chan, Chi Ho
Kittler, Josef
Marcel, Sebastien
Mc Cool, Christopher
Argones Rua, Enrique
Alba Castro, Jose Luis
Villegas, Mauricio
Paredes, Roberto
Struc, Vitomir
Pavesic, Nikola
Salah, Albert Ali
Fang, Hui
Costen, Nicholas
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2010, 5 (04) : 781 - 801
[47] Semantic-Guided Zero-Shot Learning for Low-Light Image/Video Enhancement
Zheng, Shen
Gupta, Gaurav
2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 581 - 590
[48] Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Fu, Tsu-Jui
Yu, Licheng
Zhang, Ning
Fu, Cheng-Yang
Su, Jong-Chyi
Wang, William Yang
Bell, Sean
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10681 - 10692
[49] INFUSION: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing
Khandelwal, Anant
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3009 - 3018
[50] Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
Huang, Hanzhuo
Feng, Yufan
Shi, Cheng
Xu, Lan
Yu, Jingyi
Yang, Sibei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →