FoldGEN: Multimodal Transformer for Garment Sketch-to-Photo Generation

被引:0
|
作者
Chen, Jia [1 ,2 ]
Wen, Yanfang [1 ]
Huang, Jin [1 ,2 ]
Hu, Xinrong [1 ,2 ]
Peng, Tao [1 ,2 ]
机构
[1] Wuhan Text Univ, Wuhan 430200, Hubei, Peoples R China
[2] Engn Res Ctr Hubei Prov Clothing Informat, Wuhan 430200, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Garment; Sketch-to-photo; Fold Generation; Transformer; Multi-modal;
D O I
10.1007/978-3-031-50072-5_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Garment sketch-to-photo generation is one of the most crucial steps in garment design. Most existing methods contain only single conditional information, and it is challenging to combine multiple condition information. At the same time, these methods cannot generate garment folds based on sketch strokes and face a low-fidelity problem. Therefore, this paper proposes a two-stage multi-modal framework, FoldGEN, to generate garment images with folds using sketches and descriptive text as conditional information. In the first stage, we combine feature matching of discriminators and semantic perception of Convolutional Neural Network in vector quantization, which can reconstruct the details and folds of the garment images. In the second stage, a multi-conditional constrained Transformer is used to establish the association between different modality data, which allows the generated images to contain not only text description information but also folds corresponding to the strokes of the sketch. Experiments show that our method can generate garment images with different folds from sketches with high fidelity while achieving the best FID and IS on both unimodal and multi-modal tasks.
引用
收藏
页码:455 / 466
页数:12
相关论文
共 45 条
  • [1] Survey on Sketch-to-photo Translation
    Donoso, Diego
    Saavedra, Jose M.
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (01)
  • [2] A fuzzy rule based multimodal framework for face sketch-to-photo retrieval
    Khan, Mohd Aamir
    Jalal, Anand Singh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 134 : 138 - 152
  • [3] End-to-End Deep Sketch-to-Photo Matching Enforcing Realistic Photo Generation
    Capozzi, Leonardo
    Pinto, Joao Ribeiro
    Cardoso, Jaime S.
    Rebelo, Ana
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2021, 2021, 12702 : 451 - 460
  • [4] Quality Guided Sketch-to-Photo Image Synthesis
    Osahor, Uche
    Kazemi, Hadi
    Dabouei, Ali
    Nasrabadi, Nasser
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3575 - 3584
  • [5] Facial Attributes Guided Deep Sketch-to-Photo Synthesis
    Kazemi, Hadi
    Iranmanesh, Mehdi
    Dabouei, Ali
    Soleymani, Sobhan
    Nasrabadi, Nasser M.
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2018), 2018, : 1 - 8
  • [6] Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis
    Xiang, Xiaoyu
    Liu, Ding
    Yang, Xiao
    Zhu, Yiheng
    Shen, Xiaohui
    Allebach, Jan P.
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 944 - 954
  • [7] Text-Guided Sketch-to-Photo Image Synthesis
    Osahor, Uche
    Nasrabadi, Nasser M.
    [J]. IEEE ACCESS, 2022, 10 : 98278 - 98289
  • [8] Face Sketch-to-Photo Synthesis from Simple Line Drawing
    Liang, Yang
    Song, Mingli
    Xie, Lei
    Bu, Jiajun
    Chen, Chun
    [J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [9] Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement
    Luying Li
    Junshu Tang
    Zhiwen Shao
    Xin Tan
    Lizhuang Ma
    [J]. The Visual Computer, 2022, 38 : 3577 - 3594
  • [10] Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement
    Li, Luying
    Tang, Junshu
    Shao, Zhiwen
    Tan, Xin
    Ma, Lizhuang
    [J]. VISUAL COMPUTER, 2022, 38 (11): : 3577 - 3594