Compositional Scene Representation Learning via Reconstruction: A Survey

被引:7
|
作者
Yuan, Jinyang [1 ]
Chen, Tonglin [1 ]
Li, Bin [1 ]
Xue, Xiangyang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;
D O I
10.1109/TPAMI.2023.3286184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.
引用
收藏
页码:11540 / 11560
页数:21
相关论文
共 50 条
  • [21] Survey of subspace learning via low-rank sparse representation
    Wu J.
    Chen Z.
    Meng M.
    Xie J.
    1600, Huazhong University of Science and Technology (49): : 1 - 19
  • [22] Primitive Representation Learning for Scene Text Recognition
    Yan, Ruijie
    Peng, Liangrui
    Xiao, Shanyu
    Yao, Gang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 284 - 293
  • [23] Scalable Multitask Representation Learning for Scene Classification
    Lapin, Maksim
    Schiele, Bernt
    Hein, Matthias
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1434 - 1441
  • [24] Supervised Representation Learning for Audio Scene Classification
    Rakotomamonjy, Alain
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1253 - 1265
  • [25] Scene representation technologies for 3DTV -: A survey
    Alatan, A. Aydin
    Yemez, Yuecel
    Gueduekbay, Ugur
    Zabulis, Xenophon
    Mueller, Karsten
    Erdem, Cigdem Eroglu
    Weigel, Christian
    Smolic, Aljoscha
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (11) : 1587 - 1605
  • [26] Adaptive Scene Category Discovery With Generative Learning and Compositional Sampling
    Lin, Liang
    Zhang, Ruimao
    Duan, Xiaohua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (02) : 251 - 260
  • [27] Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
    Xu, Zhenlin
    Niethammer, Marc
    Raffel, Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Self-Supervised Scene-Debiasing for Video Representation Learning via Background Patching
    Assefa, Maregu
    Jiang, Wei
    Gedamu, Kumie
    Yilma, Getinet
    Kumeda, Bulbula
    Ayalew, Melese
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5500 - 5515
  • [29] Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning
    Cheng, Xinhua
    Wu, Yanmin
    Jia, Mengxi
    Wang, Qian
    Zhang, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4947 - 4957
  • [30] A survey of methods for volumetric scene reconstruction from photographs
    Slabaugh, G
    Culbertson, B
    Malzbender, T
    Schafer, R
    VOLUME GRAPHICS 2001, 2001, : 81 - +