PARTS: Unsupervised segmentation with slots, attention and independence maximization

被引：10

作者：

Zoran, Daniel ^{[1
]}

Kabra, Rishabh ^{[1
]}

Lerchner, Alexander ^{[1
]}

Rezende, Danilo J. ^{[1
]}

机构：

[1] DeepMind, London, England

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.01027

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

From an early age, humans perceive the visual world as composed of coherent objects with distinctive properties such as shape, size, and color. There is great interest in building models that are able to learn similar structure, ideally in an unsupervised manner. Learning such structure from complex 3D scenes that include clutter, occlusions, interactions, and camera motion is still an open challenge. We present a model that is able to segment visual scenes from complex 3D environments into distinct objects, learn disentangled representations of individual objects, and form consistent and coherent predictions of future frames, in a fully unsupervised manner. Our model (named PARTS) builds on recent approaches that utilize iterative amortized inference and transition dynamics for deep generative models. We achieve dramatic improvements in performance by introducing several novel contributions. We introduce a recurrent slot-attention like encoder which allows for top-down influence during inference. We argue that when inferring scene structure from image sequences it is better to use a fixed prior which is shared across the sequence rather than an auto-regressive prior as often used in prior work. We demonstrate our model's success on three different video datasets (the popular benchmark CLEVRER; a simulated 3D Playroom environment; and a real-world Robotics Arm dataset). Finally, we analyze the contributions of the various model components and the representations learned by the model.

引用

页码：10419 / 10427

页数：9

共 50 条

[21] Unsupervised Representation for Semantic Segmentation by Implicit Cycle-Attention Contrastive Learning
Pang, Bo
Li, Yizhuo
Zhang, Yifan
Peng, Gao
Tang, Jiajun
Zha, Kaiwen
Li, Jiefeng
Lu, Cewu
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2044 - 2052
[22] Unsupervised Domain Adaptation for Medical Image Segmentation Using Transformer With Meta Attention
Ji, Wen
Chung, Albert C. S.
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 820 - 831
[23] The Contrastive Network With Convolution and Self-Attention Mechanisms for Unsupervised Cell Segmentation
Zhao, Yuhang
Shao, Xianhao
Chen, Cai
Song, Junlin
Tian, Chongxuan
Li, Wei
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (12) : 5837 - 5847
[24] Maximization by parts in likelihood inference
Song, PXK
Fan, YQ
Kalbfleisch, JD
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (472) : 1145 - 1158
[25] Maximization by parts in extremum estimation
Fan, Yanqin
Pastorello, Sergio
Renault, Eric
[J]. ECONOMETRICS JOURNAL, 2015, 18 (02): : 147 - 171
[26] UNSUPERVISED LEARNING WITH EXPECTED MAXIMIZATION ALGORITHM
Ruxanda, Gheorghe
Smeureanu, Ion
[J]. ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2012, 46 (01): : 17 - 44
[27] Saliency-based dual-attention network for unsupervised video object segmentation
Guifang Zhang
Hon-Cheng Wong
[J]. The Journal of Supercomputing, 2024, 80 (4) : 4996 - 5010
[28] Attention-Enhanced Disentangled Representation Learning for Unsupervised Domain Adaptation in Cardiac Segmentation
Sun, Xiaoyi
Liu, Zhizhe
Zheng, Shuai
Lin, Chen
Zhu, Zhenfeng
Zhao, Yao
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 745 - 754
[29] Saliency-based dual-attention network for unsupervised video object segmentation
Zhang, Guifang
Wong, Hon-Cheng
[J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4996 - 5010
[30] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
Li, Ping
Zhang, Yu
Yuan, Li
Xiao, Huaxin
Lin, Binbin
Xu, Xianghua
[J]. PATTERN RECOGNITION, 2024, 146

← 1 2 3 4 5 →