Compositional Scene Representation Learning via Reconstruction: A Survey

被引:7
|
作者
Yuan, Jinyang [1 ]
Chen, Tonglin [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;
D O I
10.1109/TPAMI.2023.3286184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.
引用
收藏
页码:11540 / 11560
页数:21
相关论文
共 50 条
  • [31] Learning a Hierarchical Compositional Representation of Multiple Object Classes
    Leonardis, Ales
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 529 - 529
  • [32] Network Representation Learning: A Survey
    Zhang, Daokun
    Yin, Jie
    Zhu, Xingquan
    Zhang, Chengqi
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (01) : 3 - 28
  • [33] A Survey on Hypergraph Representation Learning
    Antelmi, Alessia
    Cordasco, Gennaro
    Polato, Mirko
    Scarano, Vittorio
    Spagnuolo, Carmine
    Yang, Dingqi
    ACM COMPUTING SURVEYS, 2024, 56 (01)
  • [34] Survey on program representation learning
    Ma J.-C.
    Di X.-X.
    Duan Z.-T.
    Tang L.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (01): : 155 - 169
  • [35] Graph representation learning: a survey
    Chen, Fenxiao
    Wang, Yun-Cheng
    Wang, Bin
    Kuo, C. -C. Jay
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
  • [36] A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning
    Zhang, Rui
    Trisedya, Bayu Distiawan
    Li, Miao
    Jiang, Yong
    Qi, Jianzhong
    VLDB JOURNAL, 2022, 31 (05): : 1143 - 1168
  • [37] A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning
    Rui Zhang
    Bayu Distiawan Trisedya
    Miao Li
    Yong Jiang
    Jianzhong Qi
    The VLDB Journal, 2022, 31 : 1143 - 1168
  • [38] Remote Sensing Scene Classification by Unsupervised Representation Learning
    Lu, Xiaoqiang
    Zheng, Xiangtao
    Yuan, Yuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (09): : 5148 - 5157
  • [39] Unsupervised Learning of Compositional Scene Representations from Multiple Unspecified Viewpoints
    Yuan, Jinyang
    Li, Bin
    Xue, Xiangyang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8971 - 8979
  • [40] Disentangling Visual Priors: Unsupervised Learning of Scene Interpretations with Compositional Autoencoder
    Krawiec, Krzysztof
    Nowinowski, Antoni
    NEURAL-SYMBOLIC LEARNING AND REASONING, PT I, NESY 2024, 2024, 14979 : 240 - 256