Disentangled Representation Learning for Controllable Person Image Generation

被引:0
|
作者
Xu, Wenju [1 ]
Long, Chengjiang [2 ]
Nie, Yongwei [3 ]
Wang, Guanghui [4 ]
机构
[1] AMAZON, Palo Alto, CA 94301 USA
[2] META Real Labs, Burlingame, CA 94010 USA
[3] South China Univ Technol, Guangzhou 510006, Peoples R China
[4] Toronto Metropolitan Univ, Dept Comp Sci, Toronto, ON M5B 2K3, Canada
关键词
Disentangled representation; Transformer; controllable person synthesize; NETWORKS;
D O I
10.1109/TMM.2023.3345180
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic person images with desired poses and human attributes (e.g. pose, head, upper clothes, and pants) provided by various source persons. Unlike the existing works leveraging the semantic masks to obtain the representation of each component, we propose to generate disentangled latent code via a novel attribute encoder with transformers trained in a manner of curriculum learning from a relatively easy step to a gradually hard one. A random component mask-agnostic strategy is introduced to randomly remove component masks from the person segmentation masks, which aims at increasing the difficulty of training and promoting the transformer encoder to recognize the underlying boundaries between each component. This enables the model to transfer both the shape and texture of the components. Furthermore, we propose a novel attribute decoder network to integrate multi-level attributes (e.g. the structure feature and the attribute representation) with well-designed Dual Adaptive Denormalization (DAD) residual blocks. Extensive experiments strongly demonstrate that the proposed approach is able to transfer both the texture and shape of different human parts and yield realistic results. To our knowledge, we are the first to learn disentangled latent representations with transformers for person image generation.
引用
收藏
页码:6065 / 6077
页数:13
相关论文
共 50 条
  • [21] Temporally Disentangled Representation Learning
    Yao, Weiran
    Chen, Guangyi
    Zhang, Kun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [22] Learning Disentangled Representation Implicitly Via Transformer for Occluded Person Re-Identification
    Jia, Mengxi
    Cheng, Xinhua
    Lu, Shijian
    Zhang, Jian
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1294 - 1305
  • [23] Visual Object Networks: Image Generation with Disentangled 3D Representation
    Zhu, Jun-Yan
    Zhang, Zhoutong
    Zhang, Chengkai
    Wu, Jiajun
    Torralba, Antonio
    Tenenbaum, Joshua B.
    Freeman, William T.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [24] Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
    Shibasaki, Kei
    Ikehara, Masaaki
    [J]. IEEE ACCESS, 2023, 11 : 146054 - 146064
  • [25] Learning disentangled representation for classical models
    Huang, Dongchen
    Hu, Danqing
    Yang, Yi-feng
    [J]. PHYSICAL REVIEW B, 2022, 105 (24)
  • [26] Disentangled Representation Learning and Enhancement Network for Single Image De-Raining
    Wang, Guoqing
    Sun, Changming
    Xu, Xing
    Li, Jingjing
    Wang, Zheng
    Ma, Zeyu
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3015 - 3023
  • [27] SpaText: Spatio-Textual Representation for Controllable Image Generation
    Avrahami, Omri
    Hayes, Thomas
    Gafni, Oran
    Gupta, Sonal
    Taigman, Yaniv
    Parikh, Devi
    Lischinski, Dani
    Fried, Ohad
    Yin, Xi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18370 - 18380
  • [28] Polarized-VAE: Proximity Based Disentangled Representation Learning for Text Generation
    Balasubramanian, Vikash
    Kobyzev, Ivan
    Bahuleyan, Hareesh
    Shapiro, Ilya
    Vechtomova, Olga
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 416 - 423
  • [29] Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior
    Nakagawa, Nao
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. IEEE ACCESS, 2021, 9 : 110880 - 110888
  • [30] Learning Disentangled Representation for Chromosome Straightening
    Liu, Tao
    Peng, Yifeng
    Chen, Ran
    Lai, Yi
    Zhang, Haoxi
    Szczerbicki, Edward
    [J]. CYBERNETICS AND SYSTEMS, 2023,