Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

被引:11
|
作者
Yang, Shuai [1 ]
Zhou, Yifan [1 ]
Liu, Ziwei [1 ]
Loy, Chen Change [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
关键词
Video translation; temporal consistency; off-the-shelf Stable Diffusion; optical flow;
D O I
10.1145/3610548.3618160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos. Code is available at our project page: https://www.mmlab-ntu.com/project/rerender/
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
    Yang, Serin
    Hwang, Hyunmin
    Ye, Jong Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22816 - 22825
  • [32] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
    Jiang, Ruixiang
    Liu, Lingbo
    Chen, Changwen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4535 - 4545
  • [33] Learning Universal Policies via Text-Guided Video Generation
    Du, Yilun
    Yang, Mengjiao
    Dai, Bo
    Dai, Hanjun
    Nachum, Ofir
    Tenenbaum, Joshua B.
    Schuurmans, Dale
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] Polygon generation and video-to-video translation for time-series prediction
    Mohamed Elhefnawy
    Ahmed Ragab
    Mohamed-Salah Ouali
    Journal of Intelligent Manufacturing, 2023, 34 : 261 - 279
  • [35] An Image Grid Can Be Worth a Video: Zero-Shot Video Question Answering Using a VLM
    Kim, Wonkyun
    Choi, Changin
    Lee, Wonseok
    Rhee, Wonjong
    IEEE ACCESS, 2024, 12 : 193057 - 193075
  • [36] Polygon generation and video-to-video translation for time-series prediction
    Elhefnawy, Mohamed
    Ragab, Ahmed
    Ouali, Mohamed-Salah
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 34 (01) : 261 - 279
  • [37] Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
    Wu, Bichen
    Chuang, Ching-Yao
    Wang, Xiaoyan
    Jia, Yichen
    Krishnakumar, Kapil
    Xiao, Tong
    Liang, Feng
    Yu, Licheng
    Vajda, Peter
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8261 - 8270
  • [38] Efficient and consistent zero-shot video generation with diffusion models
    Frakes, Ethan
    Khalid, Umar
    Chen, Chen
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
  • [39] Prompt-based Zero-shot Video Moment Retrieval
    Wang, Guolong
    Wu, Xun
    Liu, Zhaoyuan
    Yan, Junchi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [40] Zero-Shot Video Grounding With Pseudo Query Lookup and Verification
    Lu, Yu
    Quan, Ruijie
    Zhu, Linchao
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1643 - 1654