HARIVO: Harnessing Text-to-Image Models for Video Generation

被引:0
|
作者
Kwon, Mingi [1 ,2 ,5 ]
Oh, Seoung Wug [2 ]
Zhou, Yang [2 ]
Liu, Difan [2 ]
Lee, Joon-Young [2 ]
Cai, Haoran [2 ]
Liu, Baqiao [2 ,3 ]
Liu, Feng [2 ,4 ]
Uh, Youngjung [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe, San Jose, CA 95110 USA
[3] Univ Illinois, Champaign, IL USA
[4] Portland State Univ, Portland, OR USA
[5] GivernyAI, Giverny, France
来源
基金
新加坡国家研究基金会;
关键词
D O I
10.1007/978-3-031-73668-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique architecture, incorporating a mapping network and frame-wise tokens, tailored for video generation while maintaining the diversity and creativity of the original T2I model. Key innovations include novel loss functions for temporal smoothness and a mitigating gradient sampling technique, ensuring realistic and temporally consistent video generation despite limited public video data. We have successfully integrated video-specific inductive biases into the architecture and loss functions. Our method, built on the frozen StableDiffusion model, simplifies training processes and allows for seamless integration with off-the-shelf models like ControlNet and DreamBooth. project page: https://kwonminki.github.io/HARIVO/.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 50 条
  • [11] Semantics Disentangling for Text-to-Image Generation
    Yin, Guojun
    Liu, Bin
    Sheng, Lu
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
  • [12] Text-to-Image Generation for Abstract Concepts
    Liao, Jiayi
    Chen, Xu
    Fu, Qiang
    Du, Lun
    He, Xiangnan
    Wang, Xiang
    Han, Shi
    Zhang, Dongmei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
  • [13] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [14] Perceptions and Realities of Text-to-Image Generation
    Oppenlaender, Jonas
    Silvennoinen, Johanna
    Paananen, Ville
    Visuri, Aku
    PROCEEDINGS OF THE 26TH INTERNATIONAL ACADEMIC MINDTREK, MINDTREK 2023, 2023, : 279 - 288
  • [15] Optimizing Prompts for Text-to-Image Generation
    Hao, Yaru
    Chi, Zewen
    Dong, Li
    Wei, Furu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [16] Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
    Lee, Jaewoong
    Jang, Sangwon
    Jo, Jaehyeong
    Yoon, Jaehong
    Kim, Yunji
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Hwang, Sung Ju
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23195 - 23205
  • [17] Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
    Khachatryan, Levon
    Movsisyan, Andranik
    Tadevosyan, Vahram
    Henschel, Roberto
    Wang, Zhangyang
    Navasardyan, Shant
    Shi, Humphrey
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15908 - 15918
  • [18] Comparative Review of Text-to-Image Generation Techniques Based on Diffusion Models
    Gao, Xinyu
    Du, Fang
    Song, Lijuan
    Computer Engineering and Applications, 2024, 60 (24) : 44 - 64
  • [19] MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
    Zhao, Jing
    Zheng, Heliang
    Wang, Chaoyue
    Lan, Long
    Yang, Wenjing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22535 - 22545
  • [20] Exploring text-to-image generation models: Applications and cloud resource utilization
    Jaiprakash, Sahani Pooja
    Prakash, Shyam
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123