PARTS: Unsupervised segmentation with slots, attention and independence maximization

被引:10
|
作者
Zoran, Daniel [1 ]
Kabra, Rishabh [1 ]
Lerchner, Alexander [1 ]
Rezende, Danilo J. [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
10.1109/ICCV48922.2021.01027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From an early age, humans perceive the visual world as composed of coherent objects with distinctive properties such as shape, size, and color. There is great interest in building models that are able to learn similar structure, ideally in an unsupervised manner. Learning such structure from complex 3D scenes that include clutter, occlusions, interactions, and camera motion is still an open challenge. We present a model that is able to segment visual scenes from complex 3D environments into distinct objects, learn disentangled representations of individual objects, and form consistent and coherent predictions of future frames, in a fully unsupervised manner. Our model (named PARTS) builds on recent approaches that utilize iterative amortized inference and transition dynamics for deep generative models. We achieve dramatic improvements in performance by introducing several novel contributions. We introduce a recurrent slot-attention like encoder which allows for top-down influence during inference. We argue that when inferring scene structure from image sequences it is better to use a fixed prior which is shared across the sequence rather than an auto-regressive prior as often used in prior work. We demonstrate our model's success on three different video datasets (the popular benchmark CLEVRER; a simulated 3D Playroom environment; and a real-world Robotics Arm dataset). Finally, we analyze the contributions of the various model components and the representations learned by the model.
引用
收藏
页码:10419 / 10427
页数:9
相关论文
共 50 条
  • [21] Unsupervised Representation for Semantic Segmentation by Implicit Cycle-Attention Contrastive Learning
    Pang, Bo
    Li, Yizhuo
    Zhang, Yifan
    Peng, Gao
    Tang, Jiajun
    Zha, Kaiwen
    Li, Jiefeng
    Lu, Cewu
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2044 - 2052
  • [22] Unsupervised Domain Adaptation for Medical Image Segmentation Using Transformer With Meta Attention
    Ji, Wen
    Chung, Albert C. S.
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 820 - 831
  • [23] The Contrastive Network With Convolution and Self-Attention Mechanisms for Unsupervised Cell Segmentation
    Zhao, Yuhang
    Shao, Xianhao
    Chen, Cai
    Song, Junlin
    Tian, Chongxuan
    Li, Wei
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (12) : 5837 - 5847
  • [24] Maximization by parts in likelihood inference
    Song, PXK
    Fan, YQ
    Kalbfleisch, JD
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (472) : 1145 - 1158
  • [25] Maximization by parts in extremum estimation
    Fan, Yanqin
    Pastorello, Sergio
    Renault, Eric
    [J]. ECONOMETRICS JOURNAL, 2015, 18 (02): : 147 - 171
  • [26] UNSUPERVISED LEARNING WITH EXPECTED MAXIMIZATION ALGORITHM
    Ruxanda, Gheorghe
    Smeureanu, Ion
    [J]. ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2012, 46 (01): : 17 - 44
  • [27] Saliency-based dual-attention network for unsupervised video object segmentation
    Guifang Zhang
    Hon-Cheng Wong
    [J]. The Journal of Supercomputing, 2024, 80 (4) : 4996 - 5010
  • [28] Attention-Enhanced Disentangled Representation Learning for Unsupervised Domain Adaptation in Cardiac Segmentation
    Sun, Xiaoyi
    Liu, Zhizhe
    Zheng, Shuai
    Lin, Chen
    Zhu, Zhenfeng
    Zhao, Yao
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 745 - 754
  • [29] Saliency-based dual-attention network for unsupervised video object segmentation
    Zhang, Guifang
    Wong, Hon-Cheng
    [J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4996 - 5010
  • [30] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
    Li, Ping
    Zhang, Yu
    Yuan, Li
    Xiao, Huaxin
    Lin, Binbin
    Xu, Xianghua
    [J]. PATTERN RECOGNITION, 2024, 146