Toward Multi-Modal Conditioned Fashion Image Translation

被引:13
|
作者
Gu, Xiaoling [1 ]
Yu, Jun [1 ]
Wong, Yongkang [2 ]
Kankanhalli, Mohan S. [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Key Lab Complex Syst Modeling & Simulat, Hangzhou 310018, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119613, Singapore
基金
美国国家科学基金会; 新加坡国家研究基金会;
关键词
Generative adversarial network; fashion image synthesis; image-to-image translation; RETRIEVAL;
D O I
10.1109/TMM.2020.3009500
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Having the capability to synthesize photo-realistic fashion product images conditioned on multiple attributes or modalities would bring many new exciting applications. In this work, we propose an end-to-end network architecture that built upon a new generative adversarial network for automatically synthesizing photo-realistic images of fashion products under multiple conditions. Given an input pose image that consists of a 2D skeleton pose and a sentence description of products, our model synthesizes a fashion image preserving the same pose and wearing the fashion products described as the text. Specifically, the generator G tries to generate realistic-looking fashion images based on a < pose, text > pair condition to fool the discriminator. An attention network is added for enhancing the generator, which predicts a probability map indicating which part of the image needs to be attended for translation. In contrast, the discriminator D distinguishes real images from the translated ones based on the input pose image and text description. The discriminator is divided into two multi-scale sub-discriminators for improving image distinguishing task. Quantitative and qualitative analysis demonstrates that our method is capable of synthesizing realistic images that retain the poses of given images while matching the semantics of provided sentence descriptions.
引用
收藏
页码:2361 / 2371
页数:11
相关论文
共 50 条
  • [41] MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction
    Cui, Jiaqi
    Zeng, Xinyi
    Zeng, Pinxian
    Liu, Bo
    Wu, Xi
    Zhou, Jiliu
    Wang, Yan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 467 - 477
  • [42] The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark
    Skenderi, Geri
    Joppi, Christian
    Denitto, Matteo
    Scarpa, Berniero
    Cristani, Marco
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2240 - 2245
  • [43] Guided Image Deblurring by Deep Multi-Modal Image Fusion
    Liu, Yuqi
    Sheng, Zehua
    Shen, Hui-Liang
    IEEE ACCESS, 2022, 10 : 130708 - 130718
  • [44] Principle-to-program: Neural Fashion Recommendation with Multi-modal Input
    Chelliah, Muthusamy
    Biswas, Soma
    Dhakad, Lucky
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2706 - 2708
  • [45] Toward's Arabic Multi-modal Sentiment Analysis
    Alqarafi, Abdulrahman S.
    Adeel, Ahsan
    Gogate, Mandar
    Dashitpour, Kia
    Hussain, Amir
    Durrani, Tariq
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2378 - 2386
  • [46] MM-FRec: Multi-Modal Enhanced Fashion Item Recommendation
    Song, Xuemeng
    Wang, Chun
    Sun, Changchang
    Feng, Shanshan
    Zhou, Min
    Nie, Liqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10072 - 10084
  • [47] Multi-modal and multi-vendor retina image registration
    Li, Zhang
    Huang, Fan
    Zhang, Jiong
    Dashtbozorg, Behdad
    Abbasi-Sureshjani, Samaneh
    Sun, Yue
    Long, Xi
    Yu, Qifeng
    Romeny, Bart ter Haar
    Tan, Tao
    BIOMEDICAL OPTICS EXPRESS, 2018, 9 (02): : 410 - 422
  • [48] Robust Multi-Scale Multi-modal Image Registration
    Holtzman-Gazit, Michal
    Yavneh, Irad
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XIX, 2010, 7697
  • [49] Efficient text-image semantic search: A multi-modal vision-language approach for fashion retrieval
    Moro, Gianluca
    Salvatori, Stefano
    Frisoni, Giacomo
    NEUROCOMPUTING, 2023, 538
  • [50] PRFusion: Toward Effective and Robust Multi-Modal Place Recognition With Image and Point Cloud Fusion
    Wang, Sijie
    Kang, Qiyu
    She, Rui
    Zhao, Kai
    Song, Yang
    Tay, Wee Peng
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 20523 - 20534