Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引:11
|
作者
Yang, Shuai [1 ]
Zhou, Yifan [1 ]
Liu, Ziwei [1 ]
Loy, Chen Change [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
关键词
Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;
D O I
10.1145/3610548.3618160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    JOURNAL OF IMAGING, 2024, 10 (09)
  • [42] SKETCHQL Demonstration: Zero-shot Video Moment Querying with Sketches
    Wu, Renzhi
    Chunduri, Pramod
    Shah, Dristi j
    Aravind, Ashmitha Julius
    Payani, Ali
    Chu, Xu
    Arulraj, Joy
    Rong, Kexin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4429 - 4432
  • [43] Language-free Training for Zero-shot Video Grounding
    Kim, Dahye
    Park, Jungin
    Lee, Jiyoung
    Park, Seongheon
    Sohn, Kwanghoon
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2538 - 2547
  • [44] Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
    Yuan, Yichen
    Wang, Yifan
    Wang, Lijun
    Zhao, Xiaoqi
    Lu, Huchuan
    Wang, Yu
    Su, Weibo
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 966 - 976
  • [45] Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning
    Liu, Kangning
    Gu, Shuhang
    Romero, Andres
    Timofte, Radu
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1029 - 1039
  • [46] An Evaluation of Video-to-Video Face Verification
    Poh, Norman
    Chan, Chi Ho
    Kittler, Josef
    Marcel, Sebastien
    Mc Cool, Christopher
    Argones Rua, Enrique
    Alba Castro, Jose Luis
    Villegas, Mauricio
    Paredes, Roberto
    Struc, Vitomir
    Pavesic, Nikola
    Salah, Albert Ali
    Fang, Hui
    Costen, Nicholas
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2010, 5 (04) : 781 - 801
  • [47] Semantic-Guided Zero-Shot Learning for Low-Light Image/Video Enhancement
    Zheng, Shen
    Gupta, Gaurav
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 581 - 590
  • [48] Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
    Fu, Tsu-Jui
    Yu, Licheng
    Zhang, Ning
    Fu, Cheng-Yang
    Su, Jong-Chyi
    Wang, William Yang
    Bell, Sean
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10681 - 10692
  • [49] INFUSION: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing
    Khandelwal, Anant
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3009 - 3018
  • [50] Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
    Huang, Hanzhuo
    Feng, Yufan
    Shi, Cheng
    Xu, Lan
    Yu, Jingyi
    Yang, Sibei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,