Toward a Visual Concept Vocabulary for GAN Latent Space

被引:1
|
作者
Schwettmann, Sarah [1 ]
Hernandez, Evan [2 ]
Bau, David [2 ]
Klein, Samuel [3 ]
Andreas, Jacob [2 ]
Torralba, Antonio [2 ]
机构
[1] MIT, BCS, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] MIT, KFG, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
10.1109/ICCV48922.2021.00673
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large body of recent work has identified transformations in the latent spaces of generative adversarial networks (GANs) that consistently and interpretably transform generated images. But existing techniques for identifying these transformations rely on either a fixed vocabulary of prespecified visual concepts, or on unsupervised disentanglement techniques whose alignment with human judgments about perceptual salience is unknown. This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space. Our approach is built from three components: (1) automatic identification of perceptually salient directions based on their layer selectivity; (2) human annotation of these directions with free-form, compositional natural language descriptions; and (3) decomposition of these annotations into a visual concept vocabulary, consisting of distilled directions labeled with single words. Experiments show that concepts learned with our approach are reliable and composable-generalizing across classes, contexts, and observers, and enabling fine-grained manipulation of image style and content.
引用
收藏
页码:6784 / 6792
页数:9
相关论文
共 50 条
  • [21] A Hybrid Supervised-Unsupervised Vocabulary Generation Algorithm for Visual Concept Recognition
    Binder, Alexander
    Wojcikiewicz, Wojciech
    Mueller, Christina
    Kawanabe, Motoaki
    COMPUTER VISION - ACCV 2010, PT III, 2011, 6494 : 95 - 108
  • [22] Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images
    Shimizu, Kye
    Ienaga, Naoto
    Takada, Kazuma
    Sugimoto, Maki
    Kasahara, Shunichi
    PROCEEDINGS OF THE ACM SYMPOSIUM ON APPLIED PERCEPTION, SAP 2022, 2022,
  • [23] Latent Space Segmentation Model for Visual Surface Defect Inspection
    Li, Mingxu
    Peng, Bo
    Zhai, Donghai
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [24] Hybrid Space revisited: from concept toward theory
    de Souza e Silva, Adriana
    Campbell, Scott W.
    Ling, Rich
    COMMUNICATION THEORY, 2024,
  • [25] VOCABULARY AND CONCEPT DEVELOPMENT
    LANGER, JH
    JOURNAL OF READING, 1967, 10 (07): : 448 - 456
  • [26] Integrated visual vocabulary in latent Dirichlet allocation-based scene classification for IKONOS image
    Kusumaningrum, Retno
    Wei, Hong
    Manurung, Ruli
    Murni, Aniati
    JOURNAL OF APPLIED REMOTE SENSING, 2014, 8
  • [27] Analyzing the latent space of GAN through local dimension estimation for disentanglement evaluation
    Choi, Jaewoong
    Hwang, Geonho
    Cho, Hyunsoo
    Kang, Myungjoo
    PATTERN RECOGNITION, 2025, 157
  • [28] OPTIMIZING LATENT SPACE DIRECTIONS FOR GAN-BASED LOCAL IMAGE EDITING
    Pajouheshgar, Ehsan
    Zhang, Tong
    Susstrunk, Sabine
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1740 - 1744
  • [29] Rayleigh EigenDirections (REDs): Nonlinear GAN Latent Space Traversals for Multidimensional Features
    Balakrishnan, Guha
    Gadde, Raghudeep
    Martinez, Aleix
    Perona, Pietro
    COMPUTER VISION - ECCV 2022, PT XVII, 2022, 13677 : 510 - 526
  • [30] WarpedGANSpace: Finding non-linear RBF paths in GAN latent space
    Tzelepis, Christos
    Tzimiropoulos, Georgios
    Patras, Ioannis
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6373 - 6382