TRTST: Arbitrary High-Quality Text-Guided Style Transfer With Transformers

被引:0
|
作者
Chen, Haibo [1 ,2 ]
Wang, Zhoujie [1 ,2 ]
Zhao, Lei [3 ]
Li, Jun [1 ,2 ]
Yang, Jian [1 ,2 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab, Key Lab Intelligent Percept & Syst High Dimens Inf, Nanjing, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
美国国家科学基金会;
关键词
Transformers; Visualization; Training; Feature extraction; Training data; Image coding; Data models; Painting; Impedance matching; Encoding; Text-guided style transfer; transformer; unpaired; visual quality; generalization ability;
D O I
10.1109/TIP.2025.3530822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided style transfer aims to repaint a content image with the target style described by a text prompt, offering greater flexibility and creativity compared to traditional image-guided style transfer. Despite the potential, existing text-guided style transfer methods often suffer from many issues, including insufficient visual quality, poor generalization ability, or a reliance on large amounts of paired training data. To address these limitations, we leverage the inherent strengths of transformers in handling multimodal data and propose a novel transformer-based framework called TRTST that not only achieves unpaired arbitrary text-guided style transfer but also significantly improves the visual quality. Specifically, TRTST explores combining a text transformer encoder with an image transformer encoder to project the input text prompt and content image into a joint embedding space and extract the desired style and content features. These features are then input into a multimodal co-attention module to stylize the image sequence based on the text sequence. We also propose a new adaptive parametric positional encoding (APPE) scheme which can adaptively produce different positional encodings to optimally match different inputs with a position encoder. In addition, to further improve content preservation, we introduce a text-guided identity loss to our model. Extensive results and comparisons are conducted to demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:759 / 771
页数:13
相关论文
共 50 条
  • [41] IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection
    Guo, Qing
    Wang, Ruofei
    Huang, Rui
    Wan, Renjie
    Sun, Shuifa
    Zhang, Yuxiang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025,
  • [42] Retinex-Guided Channel Grouping-Based Patch Swap for Arbitrary Style Transfer
    Liu, Chang
    Niu, Yi
    Ma, Mingming
    Li, Fu
    Shi, Guangming
    IEEE MULTIMEDIA, 2024, 31 (01) : 7 - 18
  • [43] DEEP PRIOR GUIDED NETWORK FOR HIGH-QUALITY IMAGE FUSION
    Yin, Jia-Li
    Chen, Bo-Hao
    Peng, Yan-Tsung
    Tsai, Chung-Chi
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [44] Guided Image Filtering for Interactive High-quality Global Illumination
    Bauszat, Pablo
    Eisemann, Martin
    Magnor, Marcus
    COMPUTER GRAPHICS FORUM, 2011, 30 (04) : 1361 - 1368
  • [45] USE OF HIGH-QUALITY ADULT SOWS FOR EMBRYO TRANSFER
    BRUSSOW, KP
    KAUFFOLD, M
    GEORGE, G
    THIEME, HJ
    MAASS, P
    MONATSHEFTE FUR VETERINARMEDIZIN, 1989, 44 (09): : 317 - 320
  • [46] Long quantum channels for high-quality entanglement transfer
    Banchi, L.
    Apollaro, T. J. G.
    Cuccoli, A.
    Vaia, R.
    Verrucchi, P.
    NEW JOURNAL OF PHYSICS, 2011, 13
  • [47] HIGH-QUALITY TONER IMAGE TRANSFER IN ELECTROPHOTOGRAPHIC PRINTING
    OGASAWARA, M
    KIMURA, M
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1988, 24 (03): : 235 - 241
  • [48] High-Quality Uniform Dry Transfer of Graphene to Polymers
    Lock, Evgeniya H.
    Baraket, Mira
    Laskoski, Matthew
    Mulvaney, Shawn P.
    Lee, Woo K.
    Sheehan, Paul E.
    Hines, Daniel R.
    Robinson, Jeremy T.
    Tosado, Jacob
    Fuhrer, Michael S.
    Hernandez, Sandra C.
    Waltont, Scott G.
    NANO LETTERS, 2012, 12 (01) : 102 - 107
  • [49] MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images
    Xu, Yanwu
    Sun, Li
    Peng, Wei
    Jia, Shuyue
    Morrison, Katelyn
    Perer, Adam
    Zandifar, Afrooz
    Visweswaran, Shyam
    Eslami, Motahhare
    Batmanghelich, Kayhan
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (10) : 3648 - 3660
  • [50] An Advanced NLP Framework for High-Quality Text-to-Speech Synthesis
    Ungurean, Catalin
    Burileanu, Dragos
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,