HARIVO: Harnessing Text-to-Image Models for Video Generation

被引：0

作者：

Kwon, Mingi ^{[1
,2
,5
]}

Oh, Seoung Wug ^{[2
]}

Zhou, Yang ^{[2
]}

Liu, Difan ^{[2
]}

Lee, Joon-Young ^{[2
]}

Cai, Haoran ^{[2
]}

Liu, Baqiao ^{[2
,3
]}

Liu, Feng ^{[2
,4
]}

Uh, Youngjung ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

[2] Adobe, San Jose, CA 95110 USA

[3] Univ Illinois, Champaign, IL USA

[4] Portland State Univ, Portland, OR USA

[5] GivernyAI, Giverny, France

来源：

COMPUTER VISION - ECCV 2024, PT LIII | 2025年 / 15111卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1007/978-3-031-73668-1_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a method to create diffusion-based video models from pretrained Text-to-Image (T2I) models. Recently, AnimateDiff proposed freezing the T2I model while only training temporal layers. We advance this method by proposing a unique architecture, incorporating a mapping network and frame-wise tokens, tailored for video generation while maintaining the diversity and creativity of the original T2I model. Key innovations include novel loss functions for temporal smoothness and a mitigating gradient sampling technique, ensuring realistic and temporally consistent video generation despite limited public video data. We have successfully integrated video-specific inductive biases into the architecture and loss functions. Our method, built on the frozen StableDiffusion model, simplifies training processes and allows for seamless integration with off-the-shelf models like ControlNet and DreamBooth. project page: https://kwonminki.github.io/HARIVO/.

引用

页码：19 / 36

页数：18

共 50 条

[21] Debiasing Text-to-Image Diffusion Models
He, Ruifei
Xue, Chuhui
Tan, Haoru
Zhang, Wenqing
Yu, Yingchen
Bai, Song
Qi, Xiaojuan
PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
[22] Holistic Evaluation of Text-to-Image Models
Lee, Tony
Yasunaga, Michihiro
Meng, Chenlin
Mai, Yifan
Park, Joon Sung
Gupta, Agrim
Zhang, Yunzhi
Narayanan, Deepak
Teufel, Hannah Benita
Bellagente, Marco
Kang, Minguk
Park, Taesung
Leskovec, Jure
Zhu, Jun-Yan
Li Fei-Fei
Wu, Jiajun
Ermon, Stefano
Liang, Percy
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[23] Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Tan, Wentao
Ding, Changxing
Jiang, Jiayu
Wang, Fei
Zhan, Yibing
Tao, Dapeng
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17127 - 17137
[24] Prompt Refinement with Image Pivot for Text-to-Image Generation
Zhan, Jingtao
Ai, Qingyao
Liu, Yiqun
Pan, Yingwei
Yao, Ting
Mao, Jiaxin
Ma, Shaoping
Mei, Tao
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
[25] Development and Classification of Image Dataset for Text-to-Image Generation
Kumar M.
Mittal M.
Singh S.
Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
[26] Visual Programming for Text-to-Image Generation and Evaluation
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[27] Zero-Shot Text-to-Image Generation
Ramesh, Aditya
Pavlov, Mikhail
Goh, Gabriel
Gray, Scott
Voss, Chelsea
Radford, Alec
Chen, Mark
Sutskever, Ilya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[28] Dense Text-to-Image Generation with Attention Modulation
Kim, Yunji
Lee, Jiyoung
Kim, Jin-Hwa
Ha, Jung-Woo
Zhu, Jun-Yan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7667 - 7677
[29] MirrorGAN: Learning Text-to-image Generation by Redescription
Qiao, Tingting
Zhang, Jing
Xu, Duanqing
Tao, Dacheng
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1505 - 1514
[30] StyleDrop: Text-to-Image Generation in Any Style
Sohn, Kihyuk
Ruiz, Nataniel
Lee, Kimin
Chin, Daniel Castro
Blok, Irina
Chang, Huiwen
Barber, Jarred
Jiang, Lu
Entis, Glenn
Li, Yuanzhen
Hao, Yuan
Essa, Irfan
Rubinstein, Michael
Krishnan, Dilip
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →