Compositional Scene Representation Learning via Reconstruction: A Survey

被引:7
|
作者
Yuan, Jinyang [1 ]
Chen, Tonglin [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;
D O I
10.1109/TPAMI.2023.3286184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.
引用
收藏
页码:11540 / 11560
页数:21
相关论文
共 50 条
  • [41] Self-Supervised Time Series Representation Learning via Cross Reconstruction Transformer
    Zhang, Wenrui
    Yang, Ling
    Geng, Shijia
    Hong, Shenda
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 16129 - 16138
  • [42] Unsupervised seismic reconstruction via deep learning with one-dimensional signal representation
    Chen, Gui
    Liu, Yang
    Zhang, Mi
    Sun, Yuhang
    Zhang, Haoran
    COMPUTERS & GEOSCIENCES, 2025, 200
  • [43] Active Scene Understanding via Online Semantic Reconstruction
    Zheng, Lintao
    Zhu, Chenyang
    Zhang, Jiazhao
    Zhao, Hang
    Huang, Hui
    Niessner, Matthias
    Xu, Kai
    COMPUTER GRAPHICS FORUM, 2019, 38 (07) : 103 - 114
  • [44] Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification
    Wang, Qi
    Huang, Wei
    Xiong, Zhitong
    Li, Xuelong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1414 - 1428
  • [45] UNSUPERVISED LEARNING OF COMPOSITIONAL SPARSE CODE FOR NATURAL IMAGE REPRESENTATION
    Hong, Yi
    Si, Zhangzhang
    Hu, Wenze
    Zhu, Song-Chun
    Wu, Ying Nian
    QUARTERLY OF APPLIED MATHEMATICS, 2014, 72 (02) : 373 - 406
  • [46] Visual Scene Reconstruction Using a Bayesian Learning Framework
    Bourouis, Sami
    Bouguila, Nizar
    Li, Yexing
    Azam, Muhammad
    IMAGE AND SIGNAL PROCESSING (ICISP 2018), 2018, 10884 : 225 - 232
  • [47] Rule-Guided Compositional Representation Learning on Knowledge Graphs
    Niu, Guanglin
    Zhang, Yongfei
    Li, Bo
    Cui, Peng
    Liu, Si
    Li, Jingyang
    Zhang, Xiaowei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2950 - 2958
  • [48] CORL: Compositional Representation Learning for Few-Shot Classification
    He, Ju
    Kortylewski, Adam
    Yuille, Alan
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3879 - 3888
  • [49] Deep video representation learning: a survey
    Ravanbakhsh, Elham
    Liang, Yongqing
    Ramanujam, J.
    Li, Xin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 59195 - 59225
  • [50] Survey on Trajectory Representation Learning Techniques
    Cao H.-L.
    Tang H.-N.
    Wang F.
    Xu Y.-J.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (05): : 1461 - 1479