Compositional Scene Representation Learning via Reconstruction: A Survey

被引:7
|
作者
Yuan, Jinyang [1 ]
Chen, Tonglin [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;
D O I
10.1109/TPAMI.2023.3286184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.
引用
收藏
页码:11540 / 11560
页数:21
相关论文
共 50 条
  • [1] SAR Nonsparse Scene Reconstruction Network via Image Feature Representation Learning
    Yang, Jianyu
    Zuo, Haowen
    An, Hongyang
    Jiang, Ruili
    Li, Zhongyu
    Sun, Zhichao
    Wu, Junjie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [2] Representation Learning via Manifold Flattening and Reconstruction
    Psenka, Michael
    Pai, Druv
    Raman, Vishal
    Sastry, Shankar
    Ma, Yi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 47
  • [3] Representation Learning of Compositional Data
    Avalos-Fernandez, Marta
    Nock, Richard
    Ong, Cheng Soon
    Rouar, Julien
    Sun, Ke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation
    Yuan, Jinyang
    Li, Bin
    Xue, Xiangyang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [5] Scene Consistency Representation Learning for Video Scene Segmentation
    Wu, Haoqian
    Chen, Keyu
    Luo, Yanan
    Qiao, Ruizhi
    Ren, Bo
    Liu, Haozhe
    Xie, Weicheng
    Shen, Linlin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14001 - 14010
  • [6] Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding
    Wan, Hai
    Luo, Yonghao
    Peng, Bo
    Zheng, Wei-Shi
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 949 - 956
  • [7] Representation Learning for Semantic Scene Understanding
    Farshad, Azade
    HHAI 2023: AUGMENTING HUMAN INTELLECT, 2023, 368 : 445 - 458
  • [8] Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
    Brachmann, Eric
    Wynn, Jamie
    Chen, Shuai
    Cavallari, Tommaso
    Monszpart, Aron
    Turmukhambetov, Daniyar
    Prisacariu, Victor Adrian
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 421 - 440
  • [9] Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
    Huang, James Y.
    Yao, Wenlin
    Song, Kaiqiang
    Zhang, Hongming
    Chen, Muhao
    Yu, Dong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14584 - 14595
  • [10] Scene Reconstruction via Coherency Imaging
    El-Halawany, Ahmed
    Beckus, Andre
    Kondakci, H. Esat
    Monroe, Morgan
    Mohammadian, Nafiseh
    Atia, George K.
    Abouraddy, Ayman F.
    30TH ANNUAL CONFERENCE OF THE IEEE PHOTONICS SOCIETY (IPC), 2017, : 605 - 606