Disentangled Representation Learning for Controllable Person Image Generation

被引:0
|
作者
Xu, Wenju [1 ]
Long, Chengjiang [2 ]
Nie, Yongwei [3 ]
Wang, Guanghui [4 ]
机构
[1] AMAZON, Palo Alto, CA 94301 USA
[2] META Real Labs, Burlingame, CA 94010 USA
[3] South China Univ Technol, Guangzhou 510006, Peoples R China
[4] Toronto Metropolitan Univ, Dept Comp Sci, Toronto, ON M5B 2K3, Canada
关键词
Disentangled representation; Transformer; controllable person synthesize; NETWORKS;
D O I
10.1109/TMM.2023.3345180
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g. pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the model to transfer both the shape and texture of the components. Furthermore, we propose a novel attribute decoder network to integrate multi-level attributes (e.g. the structure feature and the attribute representation) with well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive experiments strongly demonstrate that the proposed approach is able to transfer both the texture and shape of different human parts and yield realistic results. To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.
引用
收藏
页码:6065 / 6077
页数:13
相关论文
共 50 条
  • [1] Disentangled Person Image Generation
    Ma, Liqian
    Sun, Qianru
    Georgoulis, Stamatios
    Van Gool, Luc
    Schiele, Bernt
    Fritz, Mario
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 99 - 108
  • [2] Disentangled Representation Learning for Controllable Image Synthesis: an Information-Theoretic Perspective
    Tang, Shichang
    Zhou, Xu
    He, Xuming
    Ma, Yi
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10042 - 10049
  • [3] Controllable image generation based on causal representation learning
    Huang, Shanshan
    Wang, Yuanhao
    Gong, Zhili
    Liao, Jun
    Wang, Shu
    Liu, Li
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 135 - 148
  • [4] Learning Disentangled User Representation Based on Controllable VAE for Recommendation
    Li, Yunyi
    Zhao, Pengpeng
    Wang, Deqing
    Xian, Xuefeng
    Liu, Yanchi
    Sheng, Victor S.
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT III, 2021, 12683 : 179 - 194
  • [5] Image Generation Method for Cognizing Image Attribute Features from the Perspective of Disentangled Representation Learning
    Cai, Jianghai
    Huang, Chengquan
    Wang, Shunxia
    Luo, Senyan
    Yang, Guiyan
    Zhou, Lihua
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2024, 37 (07): : 638 - 651
  • [6] Disentangled Representation Learning and Generation With Manifold Optimization
    Pandey, Arun
    Fanuel, Michael
    Schreurs, Joachim
    Suykens, Johan A. K.
    [J]. NEURAL COMPUTATION, 2022, 34 (10) : 2009 - 2036
  • [7] Disentangled representation learning in cardiac image analysis
    Chartsias, Agisilaos
    Joyce, Thomas
    Papanastasiou, Giorgos
    Semple, Scott
    Williams, Michelle
    Newby, David E.
    Dharmakumar, Rohan
    Tsaftaris, Sotirios A.
    [J]. MEDICAL IMAGE ANALYSIS, 2019, 58
  • [8] Learning Disentangled Representation for Robust Person Re-identification
    Eom, Chanho
    Ham, Bumsub
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Small molecule generation via disentangled representation learning
    Du, Yuanqi
    Guo, Xiaojie
    Wang, Yinkai
    Shehu, Amarda
    Zhao, Liang
    [J]. BIOINFORMATICS, 2022, 38 (12) : 3200 - 3208
  • [10] Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning
    Deng, Yu
    Yang, Jiaolong
    Chen, Dong
    Wen, Fang
    Tong, Xin
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5153 - 5162