TRTST: Arbitrary High-Quality Text-Guided Style Transfer With Transformers

被引:0
|
作者
Chen, Haibo [1 ,2 ]
Wang, Zhoujie [1 ,2 ]
Zhao, Lei [3 ]
Li, Jun [1 ,2 ]
Yang, Jian [1 ,2 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab, Key Lab Intelligent Percept & Syst High Dimens Inf, Nanjing, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
美国国家科学基金会;
关键词
Transformers; Visualization; Training; Feature extraction; Training data; Image coding; Data models; Painting; Impedance matching; Encoding; Text-guided style transfer; transformer; unpaired; visual quality; generalization ability;
D O I
10.1109/TIP.2025.3530822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided style transfer aims to repaint a content image with the target style described by a text prompt, offering greater flexibility and creativity compared to traditional image-guided style transfer. Despite the potential, existing text-guided style transfer methods often suffer from many issues, including insufficient visual quality, poor generalization ability, or a reliance on large amounts of paired training data. To address these limitations, we leverage the inherent strengths of transformers in handling multimodal data and propose a novel transformer-based framework called TRTST that not only achieves unpaired arbitrary text-guided style transfer but also significantly improves the visual quality. Specifically, TRTST explores combining a text transformer encoder with an image transformer encoder to project the input text prompt and content image into a joint embedding space and extract the desired style and content features. These features are then input into a multimodal co-attention module to stylize the image sequence based on the text sequence. We also propose a new adaptive parametric positional encoding (APPE) scheme which can adaptively produce different positional encodings to optimally match different inputs with a position encoder. In addition, to further improve content preservation, we introduce a text-guided identity loss to our model. Extensive results and comparisons are conducted to demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:759 / 771
页数:13
相关论文
共 50 条
  • [1] UATST: Towards unpaired arbitrary text-guided style transfer with cross-space modulation
    Chen, Haibo
    Zhao, Lei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [2] TextStyler: A CLIP-based approach to text-guided style transfer
    Wu, Yichun
    Zhao, Huihuang
    Chen, Wenhui
    Yang, Yunfei
    Bu, Jiayi
    COMPUTERS & GRAPHICS-UK, 2024, 119
  • [3] LGAST: Towards high-quality arbitrary style transfer with local-global style learning
    Zhang, Zhanjie
    Li, Yuxiang
    Xia, Ruichen
    Yang, Mengyuan
    Wang, Yun
    Zhao, Lei
    Xing, Wei
    NEUROCOMPUTING, 2025, 623
  • [4] Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
    Yang, Serin
    Hwang, Hyunmin
    Ye, Jong Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22816 - 22825
  • [5] IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
    Wu, Ronghuan
    Su, Wanchao
    Ma, Kede
    Liao, Jing
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
  • [6] Text-Guided Style Transfer-Based Image Manipulation Using Multimodal Generative Models
    Togo, Ren
    Kotera, Megumi
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE ACCESS, 2021, 9 : 64860 - 64870
  • [7] Arbitrary Portuguese text style transfer
    da Costa, Pablo Botton
    Paraboni, Ivandre
    LINGUAMATICA, 2023, 15 (02): : 19 - 36
  • [8] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [9] Towards High-Quality Photorealistic Image Style Transfer
    Ding, Hong
    Zhang, Haimin
    Fu, Gang
    Jiang, Caoqing
    Luo, Fei
    Xiao, Chunxia
    Xu, Min
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9892 - 9905
  • [10] Text-Guided Generative Adversarial Network for Image Emotion Transfer
    Zhu, Siqi
    Qing, Chunmei
    Xu, Xiangmin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 506 - 522