StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

被引:0
|
作者
Xu, Zipeng [1 ]
Sangineto, Enver [2 ]
Sebe, Nicu [1 ]
机构
[1] Univ Trento, Trento, Italy
[2] Univ Modena & Reggio Emilia, Modena, Italy
关键词
D O I
10.1109/ICCV51070.2023.00699
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter- specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation, i.e., from input content image to output stylized image, in the discrete latent space of a large-scale pretrained vector-quantized tokenizer, e.g., the discrete variational auto-encoder (dVAE) of DALL-E. To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision that ensures stylization and content preservation simultaneously. Experimental results demonstrate the superiority of our method, which can effectively transfer art styles using language instructions at different granularities. Code is available at https://github.com/zipengxuc/StylerDALLE.
引用
收藏
页码:7567 / 7577
页数:11
相关论文
共 15 条
  • [1] ε-ViLM : Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
    Fang, Jacob Zhiyuan
    Zheng, Skyler
    Sharma, Vasu
    Piramuthu, Robinson
    [J]. 2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 529 - 540
  • [2] CPM: A large-scale generative Chinese Pre-trained language model
    Zhang, Zhengyan
    Han, Xu
    Zhou, Hao
    Ke, Pei
    Gu, Yuxian
    Ye, Deming
    Qin, Yujia
    Su, Yusheng
    Ji, Haozhe
    Guan, Jian
    Qi, Fanchao
    Wang, Xiaozhi
    Zheng, Yanan
    Zeng, Guoyang
    Cao, Huanqi
    Chen, Shengqi
    Li, Daixuan
    Sun, Zhenbo
    Liu, Zhiyuan
    Huang, Minlie
    Han, Wentao
    Tang, Jie
    Li, Juanzi
    Zhu, Xiaoyan
    Sun, Maosong
    [J]. AI OPEN, 2021, 2 : 93 - 99
  • [3] Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
    Ekgren, Ariel
    Gyllensten, Amaru Cuba
    Gogoulou, Evangelia
    Heiman, Alice
    Verlinden, Severine
    Ohman, Joey
    Carlsson, Fredrik
    Sahlgren, Magnus
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3509 - 3518
  • [4] Large-Scale News Classification using BERT Language Model: Spark NLP Approach
    Nugroho, Kuncahyo Setyo
    Sukmadewa, Anantha Yullian
    Yudistira, Novanto
    [J]. PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY, SIET 2021, 2021, : 240 - 246
  • [5] Illustrating the Benefits of Openness: A Large-Scale Spatial Economic Dispatch Model Using the Julia Language
    Weibezahn, Jens
    Kendziorski, Mario
    [J]. ENERGIES, 2019, 12 (06)
  • [6] Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents
    Pelaez, Sergio
    Verma, Gaurav
    Ribeiro, Barbara
    Shapira, Philip
    [J]. QUANTITATIVE SCIENCE STUDIES, 2024, 5 (01): : 153 - 169
  • [7] Movie Keyword Search Using Large-Scale Language Model with User-Generated Rankings and Reviews
    Miyashita, Tensho
    Shoji, Yoshiyuki
    Fujita, Sumio
    Dürst, Martin J.
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 14416 LNCS : 249 - 255
  • [8] Anomaly detection model for large-scale industrial systems using transfer entropy and graph attention network
    Liang, Shuo
    Pi, Dechang
    Zhang, Xiangyan
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (09)
  • [9] Large-Scale Cross-Language Web Page Classification via Dual Knowledge Transfer Using Fast Nonnegative Matrix Trifactorization
    Wang, Hua
    Nie, Feiping
    Huang, Heng
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2015, 10 (01)
  • [10] A STUDY OF HEAT TRANSFER PROCESSES IN BANKS OF FINNED TUBES IN CROSS-FLOW USING A LARGE-SCALE MODEL TECHNIQUE
    NEARL, SBH
    HITCHCOC.JA
    [J]. CHEMICAL ENGINEERING PROGRESS, 1966, 62 (08) : 81 - &