PARTS: Unsupervised segmentation with slots, attention and independence maximization

被引:10
|
作者
Zoran, Daniel [1 ]
Kabra, Rishabh [1 ]
Lerchner, Alexander [1 ]
Rezende, Danilo J. [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
10.1109/ICCV48922.2021.01027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From an early age, humans perceive the visual world as composed of coherent objects with distinctive properties such as shape, size, and color. There is great interest in building models that are able to learn similar structure, ideally in an unsupervised manner. Learning such structure from complex 3D scenes that include clutter, occlusions, interactions, and camera motion is still an open challenge. We present a model that is able to segment visual scenes from complex 3D environments into distinct objects, learn disentangled representations of individual objects, and form consistent and coherent predictions of future frames, in a fully unsupervised manner. Our model (named PARTS) builds on recent approaches that utilize iterative amortized inference and transition dynamics for deep generative models. We achieve dramatic improvements in performance by introducing several novel contributions. We introduce a recurrent slot-attention like encoder which allows for top-down influence during inference. We argue that when inferring scene structure from image sequences it is better to use a fixed prior which is shared across the sequence rather than an auto-regressive prior as often used in prior work. We demonstrate our model's success on three different video datasets (the popular benchmark CLEVRER; a simulated 3D Playroom environment; and a real-world Robotics Arm dataset). Finally, we analyze the contributions of the various model components and the representations learned by the model.
引用
收藏
页码:10419 / 10427
页数:9
相关论文
共 50 条
  • [1] Unsupervised Image Segmentation Based on Expectation-Maximization Algorithm
    Guan, Ji-shi
    Shi, Yao-wu
    Qiu, Jian-wen
    Hou, Yi-min
    [J]. 2015 INTERNATIONAL CONFERENCE ON APPLIED MECHANICS AND MECHATRONICS ENGINEERING (AMME 2015), 2015, : 506 - 510
  • [2] Unsupervised Word Segmentation from Speech with Attention
    Godard, Pierre
    Boito, Marcely Zanon
    Ondel, Lucas
    Berard, Alexandre
    Yvon, Francois
    Villavicencio, Aline
    Besacier, Laurent
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2678 - 2682
  • [3] Unsupervised Image Segmentation by Mutual Information Maximization and Adversarial Regularization
    Mirsadeghi, S. Ehsan
    Royat, Ali
    Rezatofighi, Hamid
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 6931 - 6938
  • [4] Unsupervised hierarchical image segmentation through fuzzy entropy maximization
    Yin, Shibai
    Qian, Yiming
    Gong, Minglun
    [J]. PATTERN RECOGNITION, 2017, 68 : 245 - 259
  • [5] Expectation-Maximization Attention Networks for Semantic Segmentation
    Li, Xia
    Zhong, Zhisheng
    Wu, Jianlong
    Yang, Yibo
    Lin, Zhouchen
    Liu, Hong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9166 - 9175
  • [6] Unsupervised image segmentation utilizing penalized inverse expectation maximization algorithm
    Khan, Jesmin F.
    Adhami, Reza R.
    Bhuiyan, Sharif M. A.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 937 - 940
  • [7] Deep expectation-maximization network for unsupervised image segmentation and clustering
    Pu, Yannan
    Sun, Jian
    Tang, Niansheng
    Xu, Zongben
    [J]. IMAGE AND VISION COMPUTING, 2023, 135
  • [8] Joint Attention Mechanism for Unsupervised Video Object Segmentation
    Yao, Rui
    Xu, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Fang, Liang
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 154 - 165
  • [9] Asymmetric Attention Fusion for Unsupervised Video Object Segmentation
    Jiang, Hongfan
    Wu, Xiaojun
    Xu, Tianyang
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 170 - 182
  • [10] Unsupervised Domain Adaptation for Cardiac Segmentation: Towards Structure Mutual Information Maximization
    Lu, Changjie
    Zheng, Shen
    Gupta, Gaurav
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2587 - 2596