HARIVO: Harnessing Text-to-Image Models for Video Generation

被引:0
|
作者
Kwon, Mingi [1 ,2 ,5 ]
Oh, Seoung Wug [2 ]
Zhou, Yang [2 ]
Liu, Difan [2 ]
Lee, Joon-Young [2 ]
Cai, Haoran [2 ]
Liu, Baqiao [2 ,3 ]
Liu, Feng [2 ,4 ]
Uh, Youngjung [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe, San Jose, CA 95110 USA
[3] Univ Illinois, Champaign, IL USA
[4] Portland State Univ, Portland, OR USA
[5] GivernyAI, Giverny, France
来源
基金
新加坡国家研究基金会;
关键词
D O I
10.1007/978-3-031-73668-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique architecture, incorporating a mapping network and frame-wise tokens, tailored for video generation while maintaining the diversity and creativity of the original T2I model. Key innovations include novel loss functions for temporal smoothness and a mitigating gradient sampling technique, ensuring realistic and temporally consistent video generation despite limited public video data. We have successfully integrated video-specific inductive biases into the architecture and loss functions. Our method, built on the frozen StableDiffusion model, simplifies training processes and allows for seamless integration with off-the-shelf models like ControlNet and DreamBooth. project page: https://kwonminki.github.io/HARIVO/.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 50 条
  • [41] Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Bui, Trung
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7732 - 7742
  • [42] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [43] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [44] Improving text-to-image generation with object layout guidance
    Jezia Zakraoui
    Moutaz Saleh
    Somaya Al-Maadeed
    Jihad Mohammed Jaam
    Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
  • [45] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
    Ruiz, Nataniel
    Li, Yuanzhen
    Jampani, Varun
    Pritch, Yael
    Rubinstein, Michael
    Aberman, Kfir
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22500 - 22510
  • [46] Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
    Qu, Yiting
    Shen, Xinyue
    He, Xinlei
    Backes, Michael
    Zannettou, Savvas
    Zhang, Yang
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3403 - 3417
  • [47] Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
    Ko, Hyung-Kwon
    Park, Gwanmo
    Jeon, Hyeon
    Jo, Jaemin
    Kim, Juho
    Seo, Jinwook
    PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023, 2023, : 919 - 933
  • [48] Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
    Huang, Jia-Hong
    Zhu, Hongyi
    Shen, Yixian
    Rudinac, Stevan
    Kanoulas, Evangelos
    MULTIMEDIA MODELING, MMM 2025, PT IV, 2025, 15523 : 413 - 427
  • [49] Variational Distribution Learning for Unsupervised Text-to-Image Generation
    Kang, Minsoo
    Lee, Doyup
    Kim, Jiseob
    Kim, Saehoon
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
  • [50] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
    Narasimhaswamy, Supreeth
    Bhattacharya, Uttaran
    Chen, Xiang
    Dasgupta, Ishita
    Mitra, Saayan
    Hoai, Minh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479