TRTST: Arbitrary High-Quality Text-Guided Style Transfer With Transformers

被引:0
|
作者
Chen, Haibo [1 ,2 ]
Wang, Zhoujie [1 ,2 ]
Zhao, Lei [3 ]
Li, Jun [1 ,2 ]
Yang, Jian [1 ,2 ]
机构
[1] Nanjing Univ Sci & Technol, Minist Educ, PCA Lab, Key Lab Intelligent Percept & Syst High Dimens Inf, Nanjing, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
美国国家科学基金会;
关键词
Transformers; Visualization; Training; Feature extraction; Training data; Image coding; Data models; Painting; Impedance matching; Encoding; Text-guided style transfer; transformer; unpaired; visual quality; generalization ability;
D O I
10.1109/TIP.2025.3530822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-guided style transfer aims to repaint a content image with the target style described by a text prompt, offering greater flexibility and creativity compared to traditional image-guided style transfer. Despite the potential, existing text-guided style transfer methods often suffer from many issues, including insufficient visual quality, poor generalization ability, or a reliance on large amounts of paired training data. To address these limitations, we leverage the inherent strengths of transformers in handling multimodal data and propose a novel transformer-based framework called TRTST that not only achieves unpaired arbitrary text-guided style transfer but also significantly improves the visual quality. Specifically, TRTST explores combining a text transformer encoder with an image transformer encoder to project the input text prompt and content image into a joint embedding space and extract the desired style and content features. These features are then input into a multimodal co-attention module to stylize the image sequence based on the text sequence. We also propose a new adaptive parametric positional encoding (APPE) scheme which can adaptively produce different positional encodings to optimally match different inputs with a position encoder. In addition, to further improve content preservation, we introduce a text-guided identity loss to our model. Extensive results and comparisons are conducted to demonstrate the effectiveness and superiority of our method.
引用
收藏
页码:759 / 771
页数:13
相关论文
共 50 条
  • [21] DiST-GAN: Distillation-based Semantic Transfer for Text-Guided Face Generation
    Yang, Guoxing
    Fu, Feifei
    Fei, Nanyi
    Wu, Haoran
    Ma, Ruitao
    Lu, Zhiwu
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 840 - 845
  • [22] Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
    Lin, Tianwei
    Ma, Zhuoqi
    Li, Fu
    He, Dongliang
    Li, Xin
    Ding, Errui
    Wang, Nannan
    Li, Jie
    Gao, Xinbo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5137 - 5146
  • [23] IFFMStyle: High-Quality Image Style Transfer Using Invalid Feature Filter Modules
    Xu, Zhijie
    Hou, Liyan
    Zhang, Jianqin
    SENSORS, 2022, 22 (16)
  • [24] TG2: text-guided transformer GAN for restoring document readability and perceived quality
    Kodym, Oldrich
    Hradis, Michal
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (01) : 15 - 28
  • [25] High-Quality Face Caricature via Style Translation
    Laishram, Lamyanba
    Shaheryar, Muhammad
    Lee, Jong Taek
    Jung, Soon Ki
    IEEE ACCESS, 2023, 11 : 138882 - 138896
  • [26] Process Model for Composing High-quality Text Corpora
    Lounela, Mikko
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 87 - 90
  • [27] High-quality text-to-speech synthesis: An overview
    Dutoit, T.
    Journal of Electrical and Electronics Engineering, Australia, 1997, 17 (01): : 25 - 36
  • [28] HIGH-QUALITY COMPUTER TYPESETTING FOR TEXT, FORMULAS, AND LISTINGS
    CORLEY, FC
    IEEE TRANSACTIONS ON ENGINEERING WRITING AND SPEECH, 1969, EW12 (02): : 46 - &
  • [29] Designing Tools for High-Quality Alt Text Authoring
    Mack, Kelly
    Cutrell, Edward
    Lee, Bongshin
    Morris, Meredith Ringel
    23RD INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2021, 2021,
  • [30] Quality Evaluation of Arbitrary Style Transfer: Subjective Study and Objective Metric
    Chen, Hangwei
    Shao, Feng
    Chai, Xiongli
    Gu, Yuese
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3055 - 3070