HARIVO: Harnessing Text-to-Image Models for Video Generation

被引:0
|
作者
Kwon, Mingi [1 ,2 ,5 ]
Oh, Seoung Wug [2 ]
Zhou, Yang [2 ]
Liu, Difan [2 ]
Lee, Joon-Young [2 ]
Cai, Haoran [2 ]
Liu, Baqiao [2 ,3 ]
Liu, Feng [2 ,4 ]
Uh, Youngjung [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
[2] Adobe, San Jose, CA 95110 USA
[3] Univ Illinois, Champaign, IL USA
[4] Portland State Univ, Portland, OR USA
[5] GivernyAI, Giverny, France
来源
基金
新加坡国家研究基金会;
关键词
D O I
10.1007/978-3-031-73668-1_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique architecture, incorporating a mapping network and frame-wise tokens, tailored for video generation while maintaining the diversity and creativity of the original T2I model. Key innovations include novel loss functions for temporal smoothness and a mitigating gradient sampling technique, ensuring realistic and temporally consistent video generation despite limited public video data. We have successfully integrated video-specific inductive biases into the architecture and loss functions. Our method, built on the frozen StableDiffusion model, simplifies training processes and allows for seamless integration with off-the-shelf models like ControlNet and DreamBooth. project page: https://kwonminki.github.io/HARIVO/.
引用
收藏
页码:19 / 36
页数:18
相关论文
共 50 条
  • [21] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [22] Holistic Evaluation of Text-to-Image Models
    Lee, Tony
    Yasunaga, Michihiro
    Meng, Chenlin
    Mai, Yifan
    Park, Joon Sung
    Gupta, Agrim
    Zhang, Yunzhi
    Narayanan, Deepak
    Teufel, Hannah Benita
    Bellagente, Marco
    Kang, Minguk
    Park, Taesung
    Leskovec, Jure
    Zhu, Jun-Yan
    Li Fei-Fei
    Wu, Jiajun
    Ermon, Stefano
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
    Tan, Wentao
    Ding, Changxing
    Jiang, Jiayu
    Wang, Fei
    Zhan, Yibing
    Tao, Dapeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17127 - 17137
  • [24] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [25] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [26] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Dense Text-to-Image Generation with Attention Modulation
    Kim, Yunji
    Lee, Jiyoung
    Kim, Jin-Hwa
    Ha, Jung-Woo
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
  • [29] MirrorGAN: Learning Text-to-image Generation by Redescription
    Qiao, Tingting
    Zhang, Jing
    Xu, Duanqing
    Tao, Dacheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
  • [30] StyleDrop: Text-to-Image Generation in Any Style
    Sohn, Kihyuk
    Ruiz, Nataniel
    Lee, Kimin
    Chin, Daniel Castro
    Blok, Irina
    Chang, Huiwen
    Barber, Jarred
    Jiang, Lu
    Entis, Glenn
    Li, Yuanzhen
    Hao, Yuan
    Essa, Irfan
    Rubinstein, Michael
    Krishnan, Dilip
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,