Compositional Scene Representation Learning via Reconstruction: A Survey

被引：7

作者：

Yuan, Jinyang ^{[1
]}

Chen, Tonglin ^{[1
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;

D O I：

10.1109/TPAMI.2023.3286184

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.

引用

页码：11540 / 11560

页数：21

共 50 条

[21] Survey of subspace learning via low-rank sparse representation
Wu J.
Chen Z.
Meng M.
Xie J.
1600, Huazhong University of Science and Technology (49): : 1 - 19
[22] Primitive Representation Learning for Scene Text Recognition
Yan, Ruijie
Peng, Liangrui
Xiao, Shanyu
Yao, Gang
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 284 - 293
[23] Scalable Multitask Representation Learning for Scene Classification
Lapin, Maksim
Schiele, Bernt
Hein, Matthias
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1434 - 1441
[24] Supervised Representation Learning for Audio Scene Classification
Rakotomamonjy, Alain
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1253 - 1265
[25] Scene representation technologies for 3DTV -: A survey
Alatan, A. Aydin
Yemez, Yuecel
Gueduekbay, Ugur
Zabulis, Xenophon
Mueller, Karsten
Erdem, Cigdem Eroglu
Weigel, Christian
Smolic, Aljoscha
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2007, 17 (11) : 1587 - 1605
[26] Adaptive Scene Category Discovery With Generative Learning and Compositional Sampling
Lin, Liang
Zhang, Ruimao
Duan, Xiaohua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (02) : 251 - 260
[27] Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
Xu, Zhenlin
Niethammer, Marc
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[28] Self-Supervised Scene-Debiasing for Video Representation Learning via Background Patching
Assefa, Maregu
Jiang, Wei
Gedamu, Kumie
Yilma, Getinet
Kumeda, Bulbula
Ayalew, Melese
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5500 - 5515
[29] Panoptic Compositional Feature Field for Editable Scene Rendering with Network-Inferred Labels via Metric Learning
Cheng, Xinhua
Wu, Yanmin
Jia, Mengxi
Wang, Qian
Zhang, Jian
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4947 - 4957
[30] A survey of methods for volumetric scene reconstruction from photographs
Slabaugh, G
Culbertson, B
Malzbender, T
Schafer, R
VOLUME GRAPHICS 2001, 2001, : 81 - +

← 1 2 3 4 5 →