Compositional Scene Representation Learning via Reconstruction: A Survey

被引：7

作者：

Yuan, Jinyang ^{[1
]}

Chen, Tonglin ^{[1
]}

Li, Bin ^{[1
]}

Xue, Xiangyang ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Autoencoders; compositional scene representations; image reconstruction; neural networks; object-centric learning;

D O I：

10.1109/TPAMI.2023.3286184

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.

引用

页码：11540 / 11560

页数：21

共 50 条

[1] SAR Nonsparse Scene Reconstruction Network via Image Feature Representation Learning
Yang, Jianyu
Zuo, Haowen
An, Hongyang
Jiang, Ruili
Li, Zhongyu
Sun, Zhichao
Wu, Junjie
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[2] Representation Learning via Manifold Flattening and Reconstruction
Psenka, Michael
Pai, Druv
Raman, Vishal
Sastry, Shankar
Ma, Yi
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 47
[3] Representation Learning of Compositional Data
Avalos-Fernandez, Marta
Nock, Richard
Ong, Cheng Soon
Rouar, Julien
Sun, Ke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[4] Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation
Yuan, Jinyang
Li, Bin
Xue, Xiangyang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[5] Scene Consistency Representation Learning for Video Scene Segmentation
Wu, Haoqian
Chen, Keyu
Luo, Yanan
Qiao, Ruizhi
Ren, Bo
Liu, Haozhe
Xie, Weicheng
Shen, Linlin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14001 - 14010
[6] Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding
Wan, Hai
Luo, Yonghao
Peng, Bo
Zheng, Wei-Shi
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 949 - 956
[7] Representation Learning for Semantic Scene Understanding
Farshad, Azade
HHAI 2023: AUGMENTING HUMAN INTELLECT, 2023, 368 : 445 - 458
[8] Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Brachmann, Eric
Wynn, Jamie
Chen, Shuai
Cavallari, Tommaso
Monszpart, Aron
Turmukhambetov, Daniyar
Prisacariu, Victor Adrian
COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 421 - 440
[9] Bridging Continuous and Discrete Spaces: Interpretable Sentence Representation Learning via Compositional Operations
Huang, James Y.
Yao, Wenlin
Song, Kaiqiang
Zhang, Hongming
Chen, Muhao
Yu, Dong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14584 - 14595
[10] Scene Reconstruction via Coherency Imaging
El-Halawany, Ahmed
Beckus, Andre
Kondakci, H. Esat
Monroe, Morgan
Mohammadian, Nafiseh
Atia, George K.
Abouraddy, Ayman F.
30TH ANNUAL CONFERENCE OF THE IEEE PHOTONICS SOCIETY (IPC), 2017, : 605 - 606

← 1 2 3 4 5 →